hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Any possible to set hdfs block size to a value smaller than 64MB?
Date Tue, 18 May 2010 12:48:00 GMT

On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote:

> Hi, thanks for this fast answer :)
> If so, what do you mean by blocks? If a file has to be splitted, it will be
> splitted when larger than 64MB?

For every 64MB of the file, Hadoop will create a separate block.  So, if you have a 32KB file,
there will be one block of 32KB.  If the file is 65MB, then it will have one block of 64MB
and another block of 1MB.

Splitting files is very useful for load-balancing and distributing I/O across multiple nodes.
 At 32KB / file, you don't really need to split the files at all.

I recommend reading the HDFS design document for background issues like this:



> On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:
>> Hey Pierre,
>> These are not traditional filesystem blocks - if you save a file smaller
>> than 64MB, you don't lose 64MB of file space..
>> Hadoop will use 32KB to store a 32KB file (ok, plus a KB of metadata or
>> so), not 64MB.
>> Brian
>> On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote:
>>> Hi,
>>> I'm porting a legacy application to hadoop and it uses a bunch of small
>>> files.
>>> I'm aware that having such small files ain't a good idea but I'm not
>> doing
>>> the technical decisions and the port has to be done for yesterday...
>>> Of course such small files are a problem, loading 64MB blocks for a few
>>> lines of text is an evident loss.
>>> What will happen if I set a smaller, or even way smaller (32kB) blocks?
>>> Thank you.
>>> Pierre ANCELOT.
> -- 
> http://www.neko-consulting.com
> Ego sum quis ego servo
> "Je suis ce que je prot├Ęge"
> "I am what I protect"

View raw message