hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
Date Tue, 28 Aug 2012 22:21:08 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron T. Myers updated HDFS-3864:
---------------------------------

    Attachment: HDFS-3864.patch

Thanks a lot for the quick review, Todd.

Here's an updated patch which lowers the sleep time to 10 milliseconds.
                
> NN does not update internal file mtime for OP_CLOSE when reading from the edit log
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-3864
>                 URL: https://issues.apache.org/jira/browse/HDFS-3864
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-3864.patch, HDFS-3864.patch
>
>
> When logging an OP_CLOSE to the edit log, the NN writes out an updated file mtime and
atime. However, when reading in an OP_CLOSE from the edit log, the NN does not apply these
values to the in-memory FS data structure. Because of this, a file's mtime or atime may appear
to go back in time after an NN restart, or an HA failover.
> Most of the time this will be harmless and folks won't notice, but in the event one of
these files is being used in the distributed cache of an MR job when an HA failover occurs,
the job might notice that the mtime of a cache file has changed, which in MR2 will cause the
job to fail with an exception like the following:
> {noformat}
> java.io.IOException: Resource hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
changed on src filesystem (expected 1342137814599, was 1342137814473
> 	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
> 	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
> 	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
> 	at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
> 	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message