hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-519) HDFS File API should be extended to include positional read
Date Mon, 11 Sep 2006 18:40:25 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-519?page=comments#action_12433929 ] 
            
Milind Bhandarkar commented on HADOOP-519:
------------------------------------------

>>So, to be clear, you will modify both FSInputStream and FSDataInputStream to implement
the new PositionReadable interface, right?

Yes.

>>Only the primitive FSInputStream.read(long,byte[],int,int) method needs to be synchronized.
The others can be unsynchronized and implemented >>only in base classes, inherited by
optimized subclasses.

That's right. 

>>An optimized implementation of read(long,byte[],int,int) can be provided in both DFSInputStream
and LocalFSInputStream (the latter using nio's >>FileChannel.read(ByteBuffer,long)).
It might be simpler if the PositionReadble API were instead read(ByteBuffer, long), so that
the client can >>manage ByteBuffer allocation.

I was going to make LocalFSInputStream to use the default synchronized implementation. I will
study the solution you have suggested and will try to see if there are any tangible benefits.


>>Finally, we should change implementations of read(byte[],int,int) and seek(long) to
be synchronized. This won't hurt, since they're not currently >>thread safe, and it
will make the positioned-read methods thread-safe.

Yes.

> HDFS File API should be extended to include positional read
> -----------------------------------------------------------
>
>                 Key: HADOOP-519
>                 URL: http://issues.apache.org/jira/browse/HADOOP-519
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>    Affects Versions: 0.6.0
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.7.0
>
>
> HDFS Input streams should support positional read. Positional read (such as the pread
syscall on linux) allows reading for a specified offset without affecting the current file
offset. Since the underlying file state is not touched, pread can be used efficiently in multi-threaded
programs.
> Here is how I plan to implement it.
> Provide PositionedReadable interface, with the following methods:
> int read(long position, byte[] buffer, int offset, int length);
> void readFully(long position, byte[] buffer, int offset, int length);
> void readFully(long position, byte[] buffer);
> Abstract class FSInputStream would provide default implementation of the above methods
using getPos(), seek() and read() methods. The default implementation is inefficient in multi-threaded
programs since it locks the object while seeking, reading, and restoring to old state.
> DFSClient.DFSInputStream, which extends FSInputStream will provide an efficient non-synchronized
implementation for above calls.
> In addition, FSDataInputStream, which is a wrapper around FSInputStream, will provide
wrapper methods for above read methods as well.
> Patch forthcoming early next week.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message