hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Forcing all blocks to be present "locally"
Date Tue, 26 Sep 2006 00:18:53 GMT
Eric Baldeschwieler wrote:
> You might try setting the block size for these files to be "very 
> large".   This should guaranty that the entire file ends up on one node.
> If an index is composed of many files, you could "tar" them together 
> so each index is exactly one file.
> Might work...  Of course as indexes get really large, this approach 
> might have side effects.

Sorry to be so obstinate, but this won't work either. First, when 
segments are created they use whatever default block size is there (64MB 
?). Is there a per-file setBlockSize in the API? I couldn't find it - if 
there isn't then the cluster would have to be shutdown, reconfigured, 
started, and the segment data would have to be copied to change its 
block size ... yuck.

Index cannot be tar-ed, because Lucene needs direct access to several 
files included in the index.

Index sizes are several gigabytes, and consist of ~30 files per each 
segment. Segment data is several tens of gigabytes in 4 MapFiles per 

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

View raw message