hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-587) Duplicate result from multiple TIPs of the same task
Date Sat, 27 Jun 2009 01:40:47 GMT
Duplicate result from multiple TIPs of the same task
----------------------------------------------------

                 Key: HIVE-587
                 URL: https://issues.apache.org/jira/browse/HIVE-587
             Project: Hadoop Hive
          Issue Type: Bug
    Affects Versions: 0.3.0, 0.3.1
            Reporter: Zheng Shao
            Priority: Blocker


On our cluster we found a job committed with duplicate output from different TIPs of the same
Task (from FileSinkOperator).

The reason is that FileSinkOperator.commit can be called at multiple TIPs of the same task.

FileSinkOperator.jobClose() (which is called at the Hive Client side) should do either:
A. Get all successful TIPs and only move the output files of those TIPs to the output directory
B. Ignore TIPs from the JobInProgress, but only move one file out of potentially several output
files 

B is preferred because A might be slow (if the job finished and immediately got moved out
of the JobTracker memory). Since we control the file name by ourselves, we know exactly what
the file names are.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message