HIVE-28170: Implement drop stats#6391
HIVE-28170: Implement drop stats#6391soumyakanti3578 wants to merge 16 commits intoapache:masterfrom
Conversation
…S to HiveOperationType to fix TestHiveOperationType
…ROP STATISTICS FOR COLUMNS [optional list of columns];
ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDropWork.java
Outdated
Show resolved
Hide resolved
| insert into test_stats (a, b, c) values ("a", 2, 1.1); | ||
| insert into test_stats (a, b, c) values ("b", 2, 2.1); | ||
| insert into test_stats (a, b, c) values ("c", 2, 2.1); | ||
| insert into test_stats (a, b, c) values ("d", 2, 3.1); | ||
| insert into test_stats (a, b, c) values ("e", 2, 3.1); | ||
| insert into test_stats (a, b, c) values ("f", 2, 4.1); | ||
| insert into test_stats (a, b, c) values ("g", 2, 5.1); | ||
| insert into test_stats (a, b, c) values ("h", 2, 6.1); | ||
| insert into test_stats (a, b, c) values ("i", 3, 6.1); |
There was a problem hiding this comment.
Is it enough to have only 2 inserts?
There was a problem hiding this comment.
I think it's fine for column b to have just 2 unique values as this test is not really testing the histogram but whether the column stats are accurate after dropping column stats. And that is tested by the fact that before dropping column stats, the value for COLUMN_STATS_ACCURATE was:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
and after dropping stats it is:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
So this clearly shows that the column stats are not accurate.
However, in the latest commit I have added more variance in the inserts.
There was a problem hiding this comment.
Does it matter how many values was inserted before dropping the stats? I checked the drop_histogram_stats_for_columns.q.out in your last commit (Address review comments and Sonar issues) and I don't see any changes in the after drop stats part
There was a problem hiding this comment.
I don't think it is dependent on the specific value of stats at all. The tests run these:
- create table
- insert values with autogather on so that stats are computed
- describe formatted tables and columns to check that
COLUMN_STATS_ACCURATEis true for table and all columns - alter table drop statistics for all columns;
- describe formatted tables and columns to check that
COLUMN_STATS_ACCURATEis true only for table (basic stats)
The test before my last commit was fine too but I thought it's not bad to add more variance for the column although it didn't affect the test at all.
There was a problem hiding this comment.
IMHO, if the test doesn't depend on the stats, then inserting more values than necessary just adds extra complexity and wastes resources. (insert is expensive).
There was a problem hiding this comment.
Yes I agree. I have just kept 1 insert in the latest commit.
|
|
@kasakrisz I have just kept 1 insert now and the tests have passed. If the change looks good to you, please merge! |



What changes were proposed in this pull request?
Implements drop stats for columns. There is an earlier PR for this: #5721 - its review comments have been addressed in this PR.
Why are the changes needed?
https://issues.apache.org/jira/browse/HIVE-28170
Does this PR introduce any user-facing change?
No, except the new feature of dropping column stats.
How was this patch tested?