hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Krehl (JIRA)" <>
Subject [jira] [Created] (HIVE-3587) Lost data during INSERT query
Date Tue, 16 Oct 2012 23:31:03 GMT
Jim Krehl created HIVE-3587:

             Summary: Lost data during INSERT query
                 Key: HIVE-3587
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.9.0
         Environment: Ubuntu 10.04
Hadoop MapReduce 0.20.2
Cloudera 4.1.0
3 data/task nodes
            Reporter: Jim Krehl
            Priority: Critical

I'm trying to load a table using an INSERT query [1].  Not all the data is making it from
the original table into the new table.  The query generates 2 jobs.  The first job takes about
45 minutes with mapred.mapper.class =
and the second takes ~10 seconds with mapred.mapper.class = org.apache.hadoop.hive.ql.exec.ExecMapper.
 Toward the end (< 2 minutes) of the first job a number of IOExceptions are raised [2].
 The exceptions are only raised in the last mapper task to complete, the other mapper tasks
complete successfully.  The exceptions indicate that an expected temporary file is missing.
 The second jobs completes entirely successfully.  According to the task tracker web interface
the jobs are run sequentially with no overlap.  However, the second job spawns a number of
tasks which rename the very temporary files that are the cause of the failures in the first
job [3].


[2] Example: ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
File does not exist. Holder DFSClient_NONMAPREDUCE_-672101740_1 does not have any open files.

[3] Example: 2012-10-16 15:36:57,605 INFO RCFileMergeMapper: renamed path hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_task_tmp.-ext-10000/month=2012-01/_tmp.000011_0
to hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_tmp.-ext-10000/month=2012-01/000011_0
. File size is 3482

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message