hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Forcing all blocks to be present "locally"
Date Mon, 25 Sep 2006 21:32:36 GMT
Bryan A. P. Pendleton wrote:
> Would the "replication" parameter be sufficient for you? This will 
> allow you
> to push the system to make a copy of each block in a file on a higher 
> set of
> nodes, possibly equal to the number of nodes in your cluster. Of course,
> this saves no space over local copying, but it does mean that you 
> won't have
> to do the copy manually, and local-access should be sped up.
> Just use "hadoop dfs -setrep -R # /path/to/criticalfiles" where # = your
> cluster size. This assumes you're running a DataNode on each node that 
> you
> want the copies made to (and, well, that the nodes doing lookups == the
> nodes running datanodes, or else you'll end up with extra copies).

No, I don't think this would help ... I don't want to replicate each 
segment to all nodes, I can't afford it - this would quickly exhaust the 
total capacity of the cluster. If I set the replication factor lower 
than the size of the cluster, then again I have no guarantee that whole 
files are present locally.

Let's say I have 3 segments, and I want to run 3 map tasks, each with 
its own segment data. The idea is that I want to make sure that task1 
executing on node1 will have all blocks from segment1 on the local disk 
of node1; and the same for task2, task3 and so on.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message