hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6143) WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
Date Tue, 08 Apr 2014 14:00:50 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962999#comment-13962999
] 

Daryn Sharp commented on HDFS-6143:
-----------------------------------

Unfortunately this appears to introduce a performance penalty for jobs using webhdfs.

A file used as job input will open the file and immediately seek to the split location.  Currently,
the lazy open will only begin streaming from the seek offset.  Doesn't this patch cause every
map to begin streaming from offset 0, followed by the seek which closes the stream, and then
re-opening and streaming from the new offset?

If yes this adds unnecessary load, additional latency for an unnecessary connection that will
be closed, is wasteful of bandwidth for data that will be ignored, etc.  I think the patch
should be reverted.

> WebHdfsFileSystem open should throw FileNotFoundException for non-existing paths
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-6143
>                 URL: https://issues.apache.org/jira/browse/HDFS-6143
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>            Priority: Blocker
>             Fix For: 2.5.0
>
>         Attachments: HDFS-6143-branch-2.4.0.v01.patch, HDFS-6143-trunk-after-HDFS-5570.v01.patch,
HDFS-6143-trunk-after-HDFS-5570.v02.patch, HDFS-6143.v01.patch, HDFS-6143.v02.patch, HDFS-6143.v03.patch,
HDFS-6143.v04.patch, HDFS-6143.v04.patch, HDFS-6143.v05.patch, HDFS-6143.v06.patch
>
>
> WebHdfsFileSystem.open and HftpFileSystem.open incorrectly handles non-existing paths.

> - 'open', does not really open anything, i.e., it does not contact the server, and therefore
cannot discover FileNotFound, it's deferred until next read. It's counterintuitive and not
how local FS or HDFS work. In POSIX you get ENOENT on open. [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead of SC_NOT_FOUND
for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message