hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-894) dfs client protocol should allow asking for parts of the block map
Date Wed, 21 Mar 2007 01:53:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482631

Konstantin Shvachko commented on HADOOP-894:

I understand the problem as that a lot of clients are opening the same file and read the first
block of it,
e.g. in streaming, and then each reads a specific part of the file. So each client does not
need to receive
a block map for the whole file, but rather needs to get block locations in a specified range.

I propose to modify ClientProtocol.open() to
OpenFileInfo open( String src, int numBlocks )
src - is the path;
numBlocks - is the number of blocks, which locations the client wants to be calculated by
the open()
OpenFileInfo : extends DFSFileInfo {
    LocatedBlock[ numBlocks ];
DFSFileInfo contains file information including file length and replication.

ClientProtocol should also contain
public LocatedBlock[] getBlockLocations(String src, int offset, int length) throws IOException;
offset - is the starting offset in the file
length - is the number of bytes the client is supposed to read

class LocatedBlock should include an additional field
+ long startFrom;  which determines the offset within the block to the desired region of bytes.

Then we will need to reimplement seeks and reads for DFSInputStream using that API.
What would be a good default for the number of blocks that getBlockLocations()
would fetch per call if the file is read from start to finish?

> dfs client protocol should allow asking for parts of the block map
> ------------------------------------------------------------------
>                 Key: HADOOP-894
>                 URL: https://issues.apache.org/jira/browse/HADOOP-894
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>         Assigned To: Wendy Chien
> I think that the HDFS client protocol should change like:
> /** The meta-data about a file that was opened. */
> class OpenFileInfo {
>   /** the info for the first block */
>   public LocatedBlockInfo getBlockInfo();
>   public long getBlockSize();
>   public long getLength();
> }
> interface ClientProtocol extends VersionedProtocol {
>   public OpenFileInfo open(String name) throws IOException;
>   /** get block info for any range of blocks */
>   public LocatedBlockInfo[] getBlockInfo(String name, int blockOffset, int blockLength)
throws IOException;
> }
> so that the client can decide how much block info to request and when. Currently, when
the file is opened or an error occurs, the entire block list is requested and sent.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message