hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Paranjpye <same...@yahoo-inc.com>
Subject Re: Forcing all blocks to be present "locally"
Date Tue, 26 Sep 2006 01:04:01 GMT
The DFSClient API has methods to set the block size when a file is created.

 From DFSClient.java
FSOutputStream create(UTF8 src, boolean overwrite, short replication,
                       long blockSize) throws IOException


FSOutputStream create(UTF8 src, boolean overwrite, short replication,
                       long blockSize, Progressable progress) throws 

Andrzej Bialecki wrote:
> Eric Baldeschwieler wrote:
>> You might try setting the block size for these files to be "very 
>> large".   This should guaranty that the entire file ends up on one node.
>> If an index is composed of many files, you could "tar" them together 
>> so each index is exactly one file.
>> Might work...  Of course as indexes get really large, this approach 
>> might have side effects.
> Sorry to be so obstinate, but this won't work either. First, when 
> segments are created they use whatever default block size is there (64MB 
> ?). Is there a per-file setBlockSize in the API? I couldn't find it - if 
> there isn't then the cluster would have to be shutdown, reconfigured, 
> started, and the segment data would have to be copied to change its 
> block size ... yuck.
> Index cannot be tar-ed, because Lucene needs direct access to several 
> files included in the index.
> Index sizes are several gigabytes, and consist of ~30 files per each 
> segment. Segment data is several tens of gigabytes in 4 MapFiles per 
> segment.

View raw message