hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3788) distcp can't copy large files using webhdfs due to missing Content-Length header
Date Mon, 13 Aug 2012 16:26:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433254#comment-13433254
] 

Daryn Sharp commented on HDFS-3788:
-----------------------------------

The problem is complex to support multiple grid versions:
* you need either the content-length or chunking to reliably know when the file has been fully
read
* if the response isn't chunked, and there's no content-length, the client needs to obtain
the content-length by other means such as a file stat

Based on a quick glance, it looks like the current streaming servlet is explicitly setting
the content-length to 0.  (That seems wrong, because it's not an empty file)  The puzzling
part is I don't know how it works at all for either <2GB or >2GB!  Java must be implicitly
setting the content-length when the stream is <2GB.
                
> distcp can't copy large files using webhdfs due to missing Content-Length header
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-3788
>                 URL: https://issues.apache.org/jira/browse/HDFS-3788
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 0.23.3, 2.0.0-alpha
>            Reporter: Eli Collins
>            Priority: Critical
>         Attachments: distcp-webhdfs-errors.txt
>
>
> The following command fails when data1 contains a 3gb file. It passes when using hftp
or when the directory just contains smaller (<2gb) files, so looks like a webhdfs issue
with large files.
> {{hadoop distcp webhdfs://eli-thinkpad:50070/user/eli/data1 hdfs://localhost:8020/user/eli/data2}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message