hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua Caplan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-9577) Actual data loss using s3n
Date Sat, 18 May 2013 04:49:15 GMT
Joshua Caplan created HADOOP-9577:
-------------------------------------

             Summary: Actual data loss using s3n
                 Key: HADOOP-9577
                 URL: https://issues.apache.org/jira/browse/HADOOP-9577
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Joshua Caplan
            Priority: Critical


The implementation of needsTaskCommit() assumes that the FileSystem used for writing temporary
outputs is consistent.  That happens not to be the case when using the S3 native filesystem
in the US Standard region.  It is actually quite common in larger jobs for the exists() call
to return false even if the task attempt wrote output minutes earlier, which essentially cancels
the commit operation with no error.  That's real life data loss right there, folks.

The saddest part is that the Hadoop APIs do not seem to provide any legitimate means for the
various RecordWriters to communicate with the OutputCommitter.  In my projects I have created
a static map of semaphores keyed by TaskAttemptID, which all my custom RecordWriters have
to be aware of.  That's pretty lame.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message