hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pan...@brightroll.com>
Subject Re: HDFS block size
Date Fri, 16 Nov 2012 21:39:57 GMT
Thanks for the explanation and showing a different perspective.

On Fri, Nov 16, 2012 at 12:09 PM, Ted Dunning <tdunning@maprtech.com> wrote:

> Andy's points are reasonable but there are a few omissions,
>
> - modern file systems are pretty good at writing large files into
> contiguous blocks if they have a reasonable amount of space available.
>
> - the seeks in question are likely to be more to do with checking
> directories for block locations than seeking to small-ish file starts
> because modern file systems tend to group together files that are written
> at about the same time.
>
> - it is quite possible to build an HDFS-like file system that uses very
> small blocks.  There really are three considerations here that, when
> conflated, make the design more difficult than necessary.  These three
> concepts are:
>
>     the primitive unit of disk allocation
>
> This is the size of disk allocation.  For HDFS, this is variable in size
> since blocks can be smaller than the max size.  The key problem with a
> large size here is that it is relatively difficult to allow quick reading
> of the file during writing.  With a smaller block size, the block can be
> committed in a way that the reader can read it much sooner.  Extremely
> large block sizes also make R/W file systems and snapshots more difficult
> for basically the same reason.  There is no strong reason that this has to
> be conflated with the striping chunk size.
>
> Putting HDFS on top of ext3 or ext4 kind of does this, but because HDFS
> knows nothing about the blocks in the underlying system, you don't get the
> benefit.
>
>     the unit of node striping
>
> This is the size of data that is sent to each node and is intended to
> achieve read parallelism in map-reduce programs.  This should be large
> enough to cause a map task to take a reasonable time to process in order to
> make task scheduling easier.  A few hundred megabytes is commonly a good
> size, but different applications may prefer sizes as small as a MB or as
> large as a few GB.
>
>     the unit of scaling
>
> It is typical that something somewhere needs to remember what gets stuck
> where in the cluster.  Currently the name node does this with blocks.
>  Blocks are a bad choice here because they come and go quite often which
> means that the namenode has to handle lots of changes and because this
> makes caching of the name node data or persisting it to disk much harder.
>  Blocks also tend to limit scaling because you have to have so many of them
> in a large system.
>
> A counter-example to the design of HDFS is the MapR architecture.  There,
> the disk blocks are 8K, chunks are a few hundred megabytes (but flexible
> within a single cluster) and the scaling unit is 10's of gigabytes.
>  Separating these concepts allows disk contiguity, efficient node striping
> and simple HA of the file system.
>
>
> On Fri, Nov 16, 2012 at 11:53 AM, Andy Isaacson <adi@cloudera.com> wrote:
>
>> On Fri, Nov 16, 2012 at 10:55 AM, Pankaj Gupta <pankaj@brightroll.com>
>> wrote:
>> > The Hadoop Definitive Guide provides comparison with regular file
>> systems
>> > and indicates the advantage being lower number of seeks(as far as I
>> > understood it, may be I read it incorreclty, if so I apologize). But,
>> as I
>> > understand, the data node stores data on a regular file system. If this
>> is
>> > so then how does having a bigger HDFS block size provide better seek
>> > performance, when the data will ultimately be read from regular file
>> system
>> > which has much smaller block size.
>>
>> Suppose that HDFS stored data in smaller blocks (64kb for example).
>> Then ext4 would have no reason to put those small files close together
>> on disk, and reading from a HDFS file would mean reading from very
>> many ext4 files, and probably would mean many seeks.
>>
>> The large block size design of HDFS avoids that problem by giving ext4
>> the information it needs to optimize for our desired use case.
>>
>> > I see other advantages of bigger block size though:
>> >
>> > Less entries on NameNode to keep track of
>>
>> That's another benefit.
>>
>> > Less switching from datanode to datanode for the HDFS client when
>> fetching
>> > the file. If block size were small, just this switching would reduce the
>> > performance a lot. Perhaps this is the seek that the definitive guide
>> refers
>> > to.
>>
>> If one were building HDFS with a smaller block size, you'd probably
>> have to overlap block fetches from many data nodes in order to get
>> decent performance. So yes, this "switching" as you term it would be a
>> performance bottleneck.
>>
>> > Less overhead cost of setting up Map tasks. The way MR usually works is
>> that
>> > one Map task is created per block. Smaller block will mean less
>> computation
>> > per map task and thus overhead of setting up the map task would become
>> > significant.
>>
>> A MR designed for a small-block-HDFS would probably have to do
>> something different rather than one mapper per block.
>>
>> > I want to make sure I understand the advantages of having a larger block
>> > size. I specifically want to know whether there is any advantage in
>> terms of
>> > disk seeks; that one thing has got me very confused.
>>
>> Seems like you have a pretty good understanding of the issues, and I
>> hope I clarified the seek issue above.
>>
>> -andy
>>
>
>


-- 


*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | pankaj@brightroll.com

Pankaj Gupta | Software Engineer

*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com


United States | Canada | United Kingdom | Germany


We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7>
!

Mime
View raw message