hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Creating splits/tasks at the client
Date Fri, 29 Sep 2006 16:22:45 GMT
Owen O'Malley wrote:
> Of course, once we allow user-defined InputSplits we 
> will be back in exactly the same boat of running user-code on the 
> JobTracker, unless we also ship over the preferred hosts for each 
> InputFormat too.

So, to entirely avoid user code in the job tracker we'd need a final 
class that represents each task to be created, a SplitLocations.  These 
would correspond 1-1 to splits, but would only contain the list of 
preferred hosts.  A way to implement this might be to write two parallel 
files in DFS, one with the SplitLocations, and one with the Splits. 
Then the first is passed to the job tracker with the name of the second 
file.  Then only task child processes would open the split file, seeking 
to the appropriate index.  We could use ArrayFile for these, and highly 
replicate them, especially their indexes.

Doug

Mime
View raw message