hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
Date Wed, 09 Apr 2014 09:41:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963976#comment-13963976

Steve Loughran commented on HDFS-6143:


Having spent time looking at traces of swift FS operations, the combination of Open+seek is
ubiquitous, and it is expensive over long-distance links, especially with HTTP in the story.

But: we do expect {{open(path)}} to fail if its not there -changing that is a major change
in expectations.

What would make sense -long term- is for a new operation  {{openAt(Path, offset)}}. For any
of the HTTP filesystems, this would do a GET from the offset at open time; 

Short term, looking at the {{ByteRangeInputStream}}, it's inefficient in that for even a single
byte forward seek (seek(getPos()+1), it closes the connection and re-opens it, adds the cost
of setting up the connection and resets all flow control data on the channel. If you look
a {{SwiftNativeInputStream}} you can see how it does read-ahead for short range seeks, which
is a lot more efficient for any code that is reading and skipping ahead. Someone should think
about doing that as it would reduce the performance of those seeks.

> WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
> --------------------------------------------------------------------------------
>                 Key: HDFS-6143
>                 URL: https://issues.apache.org/jira/browse/HDFS-6143
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>            Priority: Blocker
>             Fix For: 2.5.0
>         Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch,
HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch,
HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch
> WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths.

> - 'open', does not really open anything, i.e., it does not contact the server, and therefore
cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not
how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND
for non-exitsing paths

This message was sent by Atlassian JIRA

View raw message