hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "$iddhe$h Divekar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency
Date Thu, 21 Jul 2016 21:00:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388417#comment-15388417
] 

$iddhe$h Divekar commented on HADOOP-11487:
-------------------------------------------

Hi Chris,
Thanks for replying.
As per AWS forum all of the S3 regions now support read-after-write consistency for new objects
added to Amazon s3.
https://forums.aws.amazon.com/ann.jspa?annID=3112
Does listStatus falls outside above consistency ?

For Hadoop 2.7 we started using s3a as per spark recommendations but
but after moving to s3a we started using 3x degradation, hence moved backed to s3n.

When will be the patch available for general use ?

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> ----------------------------------------------------------------
>
>                 Key: HADOOP-11487
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11487
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 2.7.2
>            Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm getting
the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] org.apache.hadoop.tools.mapred.CopyMapper: Failure
in copying hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz'
> 	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
> 	at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
> 	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
> 	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz'
> 	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
> 	at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
> 	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
> 	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. So probably
due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus must use
fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message