hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Zeller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-235) Add support for byte-ranges to hftp
Date Fri, 21 Aug 2009 01:14:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745761#action_12745761
] 

Bill Zeller commented on HDFS-235:
----------------------------------


h4. Overview
The code currently works as follows. 

# HftpFileSystem::open(path, bufferSize) issues a GET request to, e.g., http://namenode/data/path
# On the namenode, /data/path is handled by FileDataServlet. FileDataServlet chooses a datanode
(using JspHelper.bestNode) and issues an http redirect response to the datanode (e.g., http://datanode/streamFile?filename=path&...
)
# /streamFile?filename=path is called on the data node, which is handled by org.apache.hadoop.hdfs.server.namenode.StreamFile.
StreamFIle creates a DFSClient and serves the appropriate file.

To handle range requests, the following can be done:

* Modify /streamFile to handle range requests
* Modify the way FileDataServlet chooses a datanode (it should use the block locations in
the byte-range being requested, not the block locations for the entire file)
* Add a method to HftpFileSystem that takes one or more byte range arguments (depending on
the answer to the question below)
* Confirm that when HttpURLConnection follows redirects, it maintains headers. Specifically,
the Range header will need to be sent to the datanode after the redirect response comes back
from the namenode. 


h4. Question:
The HTTP spec supports multiple byte-ranges. This returns a multi-part (mime) request (see:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.2 )
  This is different from a request that contains a single byte-range, which returns data in
the standard format, but with an additional Content-Range header (see: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16
)
  
  There are three options that I see:
     a) Support only a single byte-range. This makes more sense to me from an API point of
view, since we can amend the following:
      HftpFileSystem::open(Path f, int buffersize)
      
      with
      
      HftpFileSystem::open(Path f, int buffersize, long begin, long end)
      
      ...which would read the file f from [begin,end].
      
     b) Support multiple byte-ranges. This would require ensuring that HttpURLConnection supports
mime responses (I don't know if it does). Supporting this would also lead to a more complicated
API (something like: )
     
      HftpFileSystem::open(Path f, int buffersize, List<ByteRange> ranges)
      
      Also, because open() returns an FSDataInputStream, supporting multiple byte-ranges would
either require that reading from the FSDataInputStream would result in reading bytes from
different ranges sequentially (requiring the client to figure out where bytes in the input
stream begin and end) or changing open() to return a list of input streams corresponding to
each byte-range. 
      
    c) We could support multiple byte-ranges in StreamFile, but only support a single byte-range
in HftpFileSystem. 
    
h4. Implementation issues: 

To parse the Range requests, I plan to use a few utility classes included in jetty. Specifically,
org.mortbay.jetty.InclusiveByteRange and org.mortbay.util.MultiPartOutputStream (but the latter
only if we decide to support multiple byte-ranges). Additionally, the logic used to handle
the byte-ranges will be heavily inspired by org.mortbay.jetty.servlet.DefaultServlet::sendData,
which is also licensed under Apache 2.

> Add support for byte-ranges to hftp
> -----------------------------------
>
>                 Key: HDFS-235
>                 URL: https://issues.apache.org/jira/browse/HDFS-235
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Venkatesh S
>            Assignee: Bill Zeller
>
> Support should be similar to http byte-serving.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message