hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-17421) Clear incorrect stats after replication
Date Thu, 31 Aug 2017 21:04:00 GMT
Daniel Dai created HIVE-17421:
---------------------------------

             Summary: Clear incorrect stats after replication
                 Key: HIVE-17421
                 URL: https://issues.apache.org/jira/browse/HIVE-17421
             Project: Hive
          Issue Type: Bug
          Components: repl
            Reporter: Daniel Dai
            Assignee: Daniel Dai


After replication, some stats summary are incorrect. If hive.compute.query.using.stats set
to true, we will get wrong result on the destination side.

This will not happen with bootstrap replication. This is because stats summary are in table
properties and will be replicated to the destination. However, in incremental replication,
this won't work. When creating table, the stats summary are empty (eg, numRows=0). Later when
we insert data, stats summary are updated with update_table_column_statistics/update_partition_column_statistics,
however, both events are not captured in incremental replication. Thus on the destination
side, we will get count(*)=0. The simple solution is to remove COLUMN_STATS_ACCURATE property
after incremental replication.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message