hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Zhang <nzh...@facebook.com>
Subject Re: wrong number of records loaded to a table is returned by Hive
Date Fri, 01 Oct 2010 17:45:53 GMT
Ping, this is a known issue. The number reported at the end of INSERT OVERWRITE is obtained
by means of Hadoop counters, which is not very reliable and subject to inaccuracy due to failed
tasks and speculations.

If you are using the latest trunk, you may want to try the feature of automatically gathering
statistics during INSERT OVERWRITE TABLE. You need to set up a MySQL/HBase for partial stats
publishing/aggregation.  You can find the design doc at http://wiki.apache.org/hadoop/Hive/StatsDev.

Note that stats is still in this experimental stage. So please feel free to report bugs/suggestions
here or to hive-dev@hadoop.apache.org<mailto:hive-dev@hadoop.apache.org>.

On Oct 1, 2010, at 10:30 AM, Ping Zhu wrote:

I had such issues on different versions of hadoop/hive: The version of hadoop/hive I am using
now is hadoop 0.20.2/hive 0.7. The version of hadoop/hive I once used is hadoop 0.20.0/hive
0.5

Ping

On Fri, Oct 1, 2010 at 10:23 AM, Ping Zhu <ping@sharethis.com<mailto:ping@sharethis.com>>
wrote:
Hi,

  I ran a simple Hive query inserting data into a target table from a source table. The number
of records loaded to the target table (say number A), which is returned by running this query,
is different with the number (say number B) returned by running a query "select count(1) from
target". I checked the number of rows in target table's HDFS files by running command "hadoop
fs -cat /root/hive/metastore_db/ptarget/* | wc -l ". The number returned is number B. I believe
number B is the actual number of rows in target table.

  I had this issue intermittently. Any comments?

  Thank you very much.

  Ping



Mime
View raw message