hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simone (JIRA)" <>
Subject [jira] [Created] (HIVE-11266) count(*) wrong result based on table statistics
Date Wed, 15 Jul 2015 10:12:04 GMT
Simone created HIVE-11266:

             Summary: count(*) wrong result based on table statistics
                 Key: HIVE-11266
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Simone
            Priority: Critical

Hive returns wrong count result on an external table with table statistics if I change table
data files.

This is the scenario in details:
1) create external table my_table (...) location 'my_location';
2) analyze table my_table compute statistics;
3) change/add/delete one or more files in 'my_location' directory;
4) select count(\*) from my_table;

In this case the count query doesn't generate a MR job and returns the result based on table
statistics. This result is wrong because is based on statistics stored in the Hive metastore
and doesn't take into account modifications introduced on data files.

Obviously setting "hive.compute.query.using.stats" to FALSE this problem doesn't occur but
the default value of this property is TRUE.

I thinks that also this post on stackoverflow, that shows another type of bug in case of multiple
insert, is related to the one that I reported:

This message was sent by Atlassian JIRA

View raw message