HIVE-29617: Error while loading column statistics of Iceberg table after upgrading Hive#6496
HIVE-29617: Error while loading column statistics of Iceberg table after upgrading Hive#6496kasakrisz wants to merge 1 commit into
Conversation
…ter upgrading Hive
49c0548 to
19463c6
Compare
|
| colStats = getTableColumnStats(table, neededColumns, colStatsCache, fetchColStats); | ||
| if (estimateStats) { | ||
| estimateStatsForMissingCols(neededColumns, colStats, conf, nr, schema); | ||
| colStats = estimateStatsForMissingCols(neededColumns, colStats, conf, nr, schema); |
There was a problem hiding this comment.
I think List<ColStatistics> colStats = Collections.emptyList(); is still here. It can lead to future mistakes like trying to use it on the same way as before.
There was a problem hiding this comment.
I checked the usage of the colStats reference you mentioned and is used in read operations. I haven't found any code that tries to modify it's content.
|
|
||
| package org.apache.hadoop.hive.ql.stats; | ||
|
|
||
| import static org.junit.Assert.assertFalse; |
There was a problem hiding this comment.
I think this is unused, as highlighted by sonar too.
| ColumnInfo columnInfoA = new ColumnInfo("a", TypeInfoFactory.intTypeInfo, "t", false); | ||
|
|
||
| List<ColStatistics> allColumnStats = StatsUtils.estimateStatsForMissingCols( | ||
| List.of("a"), Collections.emptyList(), conf, 0, List.of(columnInfoA)); |
There was a problem hiding this comment.
Do you think we can have a situation where neededColumnNames is empty and existingColStats is non-empty? If yes, we should probably add a test for that too?
| assertEquals(allColumnStats.get(0), colStatNeededAndExists); | ||
| assertEquals(allColumnStats.get(1), colStatNotNeededButExists); | ||
| assertTrue(allColumnStats.get(2).isEstimated()); | ||
| assertEquals(allColumnStats.get(2).getColumnName(), colNeededButNotExists.getInternalName()); |
There was a problem hiding this comment.
nit: I think except the first assertEquals all of them have the order of arguments flipped?
| Set<String> neededCols = new HashSet<>(neededColumns); | ||
| Set<String> colsWithStats = new HashSet<>(); | ||
| Set<String> neededCols = new HashSet<>(neededColumnNames); | ||
| Set<String> columnNamesWithStats = new HashSet<>(existingColStats.size()); |
There was a problem hiding this comment.
nit: Sonar's suggestion of HashSet.newHashSet(int numMappings) is more efficient. Not a blocker for this PR.



What changes were proposed in this pull request?
Modify StatsUtils.estimateStatsForMissingCols to return a new list of column statistics instead of adding the newly estimated stats to its input parameter.
Why are the changes needed?
When reading the column statistics of an Iceberg table fails, Collections.emptyList() is returned. Because this is an unmodifiable list, the current implementation of estimateStatsForMissingCols tries to add the estimated column statistics to it and an exception is thrown.
Does this PR introduce any user-facing change?
No.
How was this patch tested?