hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Why big block size for HDFS.
Date Sun, 31 Mar 2013 18:58:23 GMT

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Subject: Why big block size for HDFS.

>Many places it has been written that to avoid huge no of disk seeks , we store big blocks
in HDFS , so that once we seek to the location , then there is only data transfer rate which
would be predominant , no more seeks. I am not sure if I have understood this correctly.
>My question is , no matter what the block size we decide , finally its getting written
to the computers HDD , which would be formatted and would have a block size in KB's and also
while writing to the FS (not HDFS) , its not guaranteed that the blocks that we write are
continuous , so there would be disk seeks anyways .The assumption of HDFS would be only true
if the underlying Fs guarentees to write the data in continuous blocks.

>Can someone explain a bit.
>Thanks,
>Rahul

While there are no guarantees that disk storage will be contiguous, the OS will attempt to
keep large files contiguous (and may even defrag over time), and if all files are written
using large blocks, this is more likely to be the case.  If storage is contiguous, you can
write a complete track without seeking.  A complete track size varies, but a 1TB disk might
have 500KB/track.  Stepping adjacent close tracks is also much cheaper than the average seek
time, and as you might expect, has been optimized in hardware to assist sequential I/O.  However,
if you switch storage units, you will probably encounter at least one full seek at the start
of the block (since it was probably somewhere else at the time).  The result is that, on average,
writing sequential files is very fast (>100MB/sec on typical SATA).  But I think that the
blocks overhead has more to do with finding where to read the next block from, assuming that
data has been distributed evenly.

So consider connection overhead when the data is distributed.  I am no expert on the Hadoop
internals, but I suspect that somewhere, a TCP connection is opened to transfer data.  Whether
connection overhead is reduced by maintaining open connection pools, I don’t know.  But
let’s assume that there is *some* overhead for switching data transfer from machine “A”
 that owns block “1000” and machine “B” that owns block “1001”.  The larger the
block size, the less significant will be this overhead relative to the sequential transfer
rate.

In addition, MapR/YARN has an easier time of scheduling if there are fewer blocks.
--john
Mime
View raw message