Skip to content

HIVE-28170: Implement drop stats#6391

Open
soumyakanti3578 wants to merge 16 commits intoapache:masterfrom
soumyakanti3578:HIVE-28170-dropstats
Open

HIVE-28170: Implement drop stats#6391
soumyakanti3578 wants to merge 16 commits intoapache:masterfrom
soumyakanti3578:HIVE-28170-dropstats

Conversation

@soumyakanti3578
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Implements drop stats for columns. There is an earlier PR for this: #5721 - its review comments have been addressed in this PR.

Why are the changes needed?

https://issues.apache.org/jira/browse/HIVE-28170

Does this PR introduce any user-facing change?

No, except the new feature of dropping column stats.

How was this patch tested?

mvn test -pl itests/qtest -Pitests -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true -Dqfile="drop_stats_for_columns.q,drop_histogram_stats_for_columns.q"
mvn test -pl itests/qtest -Pitests -Dtest=TestNegativeLlapCliDriver -Dtest.output.overwrite=true -Dqfile="drop_stats_for_columns_iceberg.q"

Comment on lines +9 to +17
insert into test_stats (a, b, c) values ("a", 2, 1.1);
insert into test_stats (a, b, c) values ("b", 2, 2.1);
insert into test_stats (a, b, c) values ("c", 2, 2.1);
insert into test_stats (a, b, c) values ("d", 2, 3.1);
insert into test_stats (a, b, c) values ("e", 2, 3.1);
insert into test_stats (a, b, c) values ("f", 2, 4.1);
insert into test_stats (a, b, c) values ("g", 2, 5.1);
insert into test_stats (a, b, c) values ("h", 2, 6.1);
insert into test_stats (a, b, c) values ("i", 3, 6.1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to have only 2 inserts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine for column b to have just 2 unique values as this test is not really testing the histogram but whether the column stats are accurate after dropping column stats. And that is tested by the fact that before dropping column stats, the value for COLUMN_STATS_ACCURATE was:

COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}

and after dropping stats it is:

COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\"}

So this clearly shows that the column stats are not accurate.

However, in the latest commit I have added more variance in the inserts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter how many values was inserted before dropping the stats? I checked the drop_histogram_stats_for_columns.q.out‎ in your last commit (Address review comments and Sonar issues) and I don't see any changes in the after drop stats part

Copy link
Copy Markdown
Contributor Author

@soumyakanti3578 soumyakanti3578 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is dependent on the specific value of stats at all. The tests run these:

  1. create table
  2. insert values with autogather on so that stats are computed
  3. describe formatted tables and columns to check that COLUMN_STATS_ACCURATE is true for table and all columns
  4. alter table drop statistics for all columns;
  5. describe formatted tables and columns to check that COLUMN_STATS_ACCURATE is true only for table (basic stats)

The test before my last commit was fine too but I thought it's not bad to add more variance for the column although it didn't affect the test at all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, if the test doesn't depend on the stats, then inserting more values than necessary just adds extra complexity and wastes resources. (insert is expensive).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. I have just kept 1 insert in the latest commit.

Copy link
Copy Markdown
Contributor

@kasakrisz kasakrisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment about the number of inserts in a test file but overall the patch LGTM

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 1, 2026

@soumyakanti3578
Copy link
Copy Markdown
Contributor Author

@kasakrisz I have just kept 1 insert now and the tests have passed. If the change looks good to you, please merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants