hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Forcing all blocks to be present "locally"
Date Mon, 25 Sep 2006 23:34:56 GMT
You might try setting the block size for these files to be "very  
large".   This should guaranty that the entire file ends up on one node.

If an index is composed of many files, you could "tar" them together  
so each index is exactly one file.

Might work...  Of course as indexes get really large, this approach  
might have side effects.


On Sep 25, 2006, at 2:32 PM, Andrzej Bialecki wrote:

> Bryan A. P. Pendleton wrote:
>> Would the "replication" parameter be sufficient for you? This will  
>> allow you
>> to push the system to make a copy of each block in a file on a  
>> higher set of
>> nodes, possibly equal to the number of nodes in your cluster. Of  
>> course,
>> this saves no space over local copying, but it does mean that you  
>> won't have
>> to do the copy manually, and local-access should be sped up.
>>
>> Just use "hadoop dfs -setrep -R # /path/to/criticalfiles" where #  
>> = your
>> cluster size. This assumes you're running a DataNode on each node  
>> that you
>> want the copies made to (and, well, that the nodes doing lookups  
>> == the
>> nodes running datanodes, or else you'll end up with extra copies).
>
> No, I don't think this would help ... I don't want to replicate  
> each segment to all nodes, I can't afford it - this would quickly  
> exhaust the total capacity of the cluster. If I set the replication  
> factor lower than the size of the cluster, then again I have no  
> guarantee that whole files are present locally.
>
> Let's say I have 3 segments, and I want to run 3 map tasks, each  
> with its own segment data. The idea is that I want to make sure  
> that task1 executing on node1 will have all blocks from segment1 on  
> the local disk of node1; and the same for task2, task3 and so on.
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


Mime
View raw message