hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Farooqui <sam...@hortonworks.com>
Subject Re: Newbie question on block size calculation
Date Fri, 24 Feb 2012 08:44:05 GMT
Hey Viva,

If you're just getting started with HDFS, I recommend not just thinking
about this as seek time vs. transfer time when deciding what size to set
the default block. Although Tom White makes a great point about why the
block size is generally large, there are other factors to consider as well.
Tom is basically saying that if you set the block size to 100 MB, then
it'll take at least a second to read the block from disk and then you can
do some MapReduce processing on it. If you instead set the block size to 10
MB, then it would take 10ms to do the disk seek and 100ms to do the read of
the 10MB off disk. So, now 10% of your disk workload is wasted doing disk
head seeks.

Anyway, there are some other factors to consider. The block size will help
determine the # of Map tasks that get launched to process your data. For
example, say you want to do MapReduce analysis on a 10TB file in HDFS. If
the file's block size is 128MB, you will have 81,920 unique blocks making
up that file:

(10 terabytes) / (128 megabytes) = 81,920

With default replication of 3, you now have 245,760 blocks across the
cluster comprising that file in HDFS:

81,920 * 3 = 245,760

Since there are 81,920 unique blocks that make that file, the MapReduce
framework by default will launch 81,920 Map tasks to process that file (you
can influence MapReduce to use more or less maps by setting
setNumMapTasks(int)). If you make your block size 256MB, then only 40,960
Map tasks would be launched to process the file. With a 1GB block size,
only 10,240 map tasks would launch. If only 10,240 map tasks launch and
each map task as to read 1GB at 100/MBps, it would take like 10 seconds for
each map task to read it's 1GB chunk.

So, the point is that your block size can affect how fast/slow your
MapReduce jobs will run. If there are a lot of small blocks (128MB), the
MapReduce job will probably run faster than if there are a lot of larger
blocks (1GB).

Now you typically want around 10 -100 maps per-node. Also, setting up the
Java Virtual Machine for the map takes a while so it's best if the maps
take at least a minute to execute.

Also, on a side note, a HDFS block doesn't behave like a linux ext3 block
all the time. If the HDFS block size is 128MB, but the file you want to
write to HDFS is only 25MB, then that specific block will only take up 25MB
on the disk. So, not every block is exactly 128MB, some might be smaller.

Finally, the block size and replication factor are configurable per file,
but you should set a good default for both based on your custom environment
and use case.

Sameer Farooqui
Systems Architect / Hortonworks

On Thu, Feb 23, 2012 at 6:43 AM, viva v <vivamailers@gmail.com> wrote:

> Thanks very much for the clarification.
> So, we'd i guess ideally set the block size equal to the transfer rate for
> optimum results.
> If seek time has to be 0.5% of transfer time would i set my block size at
> 200MB (higher than transfer rate)?
> Conversely if seek time has to be 2% of transfer time would i still set my
> block size at 100MB?
> On Wed, Feb 22, 2012 at 8:16 PM, Praveen Sripati <praveensripati@gmail.com
> > wrote:
>> Seek time is ~ 10ms. If seek time has to be 1% of the transfer time then
>> transfer time has to be ~ 1000 ms (1s).
>> In ~ 1000 ms (1s) with a transfer rate of 100 MB/s, a block of 100MB can
>> be read.
>> Praveen
>> On Wed, Feb 22, 2012 at 11:22 AM, viva v <vivamailers@gmail.com> wrote:
>>> Have just started getting familiar with Hadoop & HDFS. Reading Tom
>>> White's book.
>>> The book describes an example related to HDFS block size. Here's a
>>> verbatim excerpt from the book
>>> "If the seek time is around 10 ms, and the transfer rate is 100 MB/s,
>>> then to make the seek time 1% of the transfer time, we need to make the
>>> block size around 100 MB."
>>> I can't seem to understand how we arrived at the fact that block size
>>> shold be 100MB.
>>> Could someone please help me understand?
>>> Thanks
>>> Viva

View raw message