hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-6214) Webhdfs has poor throughput for files >2GB
Date Wed, 16 Apr 2014 16:00:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971581#comment-13971581
] 

Kihwal Lee edited comment on HDFS-6214 at 4/16/14 3:59 PM:
-----------------------------------------------------------

Ok, if there is no performance degradation, I am fine with calling flush() for the non-chunked
case.

>So, if "io.file.buffer.size" is small enough, like 4K (the default), it may be overall
slower, but there will be no difference for files > 2GB.
I take it back. It just means the client (block reader) will read 4K at a time from datanodes.
The 24KB response buffer will be filled in and flushed, so this issue is still visible with
the default buffer size.

+1 for the patch.


was (Author: kihwal):
Ok, if there was no performance degradation, I am fine with calling flush() for the non-chunked
case.

>So, if "io.file.buffer.size" is small enough, like 4K (the default), it may be overall
slower, but there will be no difference for files > 2GB.
I take it back. It just means the client (block reader) will read 4K at a time from datanodes.
The 24KB response buffer will be filled in and flushed, so this issue is still visible with
the default buffer size.

+1 for the patch.

> Webhdfs has poor throughput for files >2GB
> ------------------------------------------
>
>                 Key: HDFS-6214
>                 URL: https://issues.apache.org/jira/browse/HDFS-6214
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-6214.patch
>
>
> For the DN's open call, jetty returns a Content-Length header for files <2GB, and
uses chunking for files >2GB.  A "bug" in jetty's buffer handling results in a ~8X reduction
in throughput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message