hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject HDFS block size
Date Fri, 16 Nov 2012 18:55:25 GMT

I apologize for asking a question that has probably been discussed many times before, but
I just want to be sure I understand it correctly. My question is regarding the advantages
of large block size in HDFS.

The Hadoop Definitive Guide provides comparison with regular file systems and indicates the
advantage being lower number of seeks(as far as I understood it, may be I read it incorreclty,
if so I apologize). But, as I understand, the data node stores data on a regular file system.
If this is so then how does having a bigger HDFS block size provide better seek performance,
when the data will ultimately be read from regular file system which has much smaller block

I see other advantages of bigger block size though:
Less entries on NameNode to keep track of
Less switching from datanode to datanode for the HDFS client when fetching the file. If block
size were small, just this switching would reduce the performance a lot. Perhaps this is the
seek that the definitive guide refers to.
Less overhead cost of setting up Map tasks. The way MR usually works is that one Map task
is created per block. Smaller block will mean less computation per map task and thus overhead
of setting up the map task would become significant.

I want to make sure I understand the advantages of having a larger block size. I specifically
want to know whether there is any advantage in terms of disk seeks; that one thing has got
me very confused.

Thanks in Advance,
View raw message