hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1296) Improve interface to FileSystem.getFileCacheHints
Date Thu, 26 Apr 2007 04:10:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491870
] 

eric baldeschwieler commented on HADOOP-1296:
---------------------------------------------

So are we comfortable returning 100s of thousands of records in a single RPC from the name
node?  Would it be better to return a max of 10k record at a time or some such limit with
a clear restart policy?  Or is it ok for a client to open a socket and suck that much data
in one session.  Clearly more RPCs is more aggregate work, just wondering about starvation,
locking, CPU spikes and all the usual suspects.

> Improve interface to FileSystem.getFileCacheHints
> -------------------------------------------------
>
>                 Key: HADOOP-1296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1296
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Owen O'Malley
>         Assigned To: dhruba borthakur
>
> The FileSystem interface provides a very limited interface for finding the location of
the data. The current method looks like:
> String[][] getFileCacheHints(Path file, long start, long len) throws IOException
> which returns a list of "block info" where the block info consists of a list host names.
Because the hints don't include the information about where the block boundaries are, map/reduce
is required to call the name node for each split. I'd propose that we fix the naming a bit
and make it:
> public class BlockInfo extends Writable {
>   public long getStart();
>   public String[] getHosts();
> }
> BlockInfo[] getFileHints(Path file, long start, long len) throws IOException;
> So that map/reduce can query about the entire file and get the locations in a single
call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message