hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-519) HDFS File API should be extended to include positional read
Date Mon, 18 Sep 2006 21:34:26 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-519?page=all ]

Doug Cutting updated HADOOP-519:
--------------------------------

    Status: Open  (was: Patch Available)

Overall this looks like a great addition that is well-implemented.  A few nits:

In DFSClient, you've duplicated a lot of code from blockSeekTo in fetchBlockByteRange.  Can
you perhaps instead add one or two more methods that capture this common code?

The javadoc for read(long,byte[], int,int) should say "read up to" or "attempt to read", since
it may not read all of the bytes (that's what readFully is for).

The javadoc comments on the new FSInputStream methods do not add anything useful to what would
be inherited from the interface, and what they do add makes them inappropriate for inheritance
by subclasses.  So these should be removed.

> HDFS File API should be extended to include positional read
> -----------------------------------------------------------
>
>                 Key: HADOOP-519
>                 URL: http://issues.apache.org/jira/browse/HADOOP-519
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.6.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.7.0
>
>         Attachments: pread.patch
>
>
> HDFS Input streams should support positional read. Positional read (such as the pread
syscall on linux) allows reading for a specified offset without affecting the current file
offset. Since the underlying file state is not touched, pread can be used efficiently in multi-threaded
programs.
> Here is how I plan to implement it.
> Provide PositionedReadable interface, with the following methods:
> int read(long position, byte[] buffer, int offset, int length);
> void readFully(long position, byte[] buffer, int offset, int length);
> void readFully(long position, byte[] buffer);
> Abstract class FSInputStream would provide default implementation of the above methods
using getPos(), seek() and read() methods. The default implementation is inefficient in multi-threaded
programs since it locks the object while seeking, reading, and restoring to old state.
> DFSClient.DFSInputStream, which extends FSInputStream will provide an efficient non-synchronized
implementation for above calls.
> In addition, FSDataInputStream, which is a wrapper around FSInputStream, will provide
wrapper methods for above read methods as well.
> Patch forthcoming early next week.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message