hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5546) race condition crashes "hadoop ls -R" when directories are moved/removed
Date Mon, 23 Jun 2014 19:20:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041159#comment-14041159

Colin Patrick McCabe commented on HDFS-5546:

I think what Daryn is advocating is that when attempting to recurse into a directory, we should
catch IOE for the {{listStatus}} operation, not just FNF.

Although this makes sense to me, there is a bit of a fly in the ointment-- if we have a glob
expression like {{/\*/\*}}, the Globber internally will throw an exception if there is a path
error while resolving the globs.  For example, if you have {{/a/b/c}} and {{/a/r/c}}, and
/a/r is inaccessible to you, {{ls /\*/\*/c}} will fail with an {{AccessControlException}}
before displaying anything.

This behavior has existed basically forever in the globber code (it wasn't added by the globber
rewrite) and unfortunately, there is no good way to fix it now.  The problem is that there
is no way to indicate that we got an error other than throwing an exception, and an exception
terminates the whole glob operation, even if there were other valid results.  So in the interest
of consistency, perhaps we should keep things the way they are, and only catch FNF?  {{ls
/a/b/c /a/r/c}} seems similar conceptually to {{ls /\*/\*/c}}... it is tricky to explain why
an exception should terminate one but not the other...

Eddy, can you take a look at the internal JIRA that prompted this and see if it was user error?
 I'm less and less convinced we should change {{ls -R}}...

> race condition crashes "hadoop ls -R" when directories are moved/removed
> ------------------------------------------------------------------------
>                 Key: HDFS-5546
>                 URL: https://issues.apache.org/jira/browse/HDFS-5546
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>             Fix For: 3.0.0
>         Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch, HDFS-5546.2.001.patch,
HDFS-5546.2.002.patch, HDFS-5546.2.003.patch
> This seems to be a rare race condition where we have a sequence of events like this:
> 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D.
> 2. someone deletes or moves directory D
> 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D).
This throws FileNotFoundException.
> 4. ls command terminates with FNF

This message was sent by Atlassian JIRA

View raw message