hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roland von Herget (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-8640) DU thread transient failures propagate to callers
Date Tue, 08 Oct 2013 06:29:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Roland von Herget updated HADOOP-8640:

    Affects Version/s: 1.2.1

> DU thread transient failures propagate to callers
> -------------------------------------------------
>                 Key: HADOOP-8640
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8640
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, io
>    Affects Versions: 2.0.0-alpha, 1.2.1
>            Reporter: Todd Lipcon
> When running some stress tests, I saw a failure where the DURefreshThread failed due
to the filesystem changing underneath it:
> {code}
> org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/data/4/dfs/dn/current/BP-1928785663-':
No such file or directory
> {code}
> (the block was probably finalized while the du process was running, which caused it to
> The next block write, then, called {{getUsed()}}, and the exception got propagated causing
the write to fail. Since it was a pseudo-distributed cluster, the client was unable to pick
a different node to write to and failed.
> The current behavior of propagating the exception to the next (and only the next) caller
doesn't seem well-thought-out.

This message was sent by Atlassian JIRA

View raw message