hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-444) In streaming with a NONE reducer, you get duplicate files if a mapper fails, is restarted, and succeeds next time.
Date Thu, 10 Aug 2006 19:23:13 GMT
In streaming with a NONE reducer, you get duplicate files if a mapper fails, is restarted,
and succeeds next time.
------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-444
                 URL: http://issues.apache.org/jira/browse/HADOOP-444
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.5.0
            Reporter: Dick King


When the dust settled after a streaming run, the directory ended up looking like this:

  /user/dking/<project-name>/K-HTML-UTF8-2006-08-09-rescued-abstracted/task_0026_m_007384_0
<r 3>	10563406
  /user/dking/<project-name>/K-HTML-UTF8-2006-08-09-rescued-abstracted/task_0026_m_007384_1
<r 3>	10563406

Future processing will receive duplicated data.

-dk


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message