hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Exponential performance decay when inserting large number of blocks
Date Wed, 13 Jan 2010 21:34:24 GMT
Also, if you have the program you used to do the insertions, and could
attach it, I'd be interested in trying to replicate this on a test cluster.
If you can't redistribute it, I can start from scratch, but would be easier
to run yours.

Thanks
-Todd

On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Zlatin,
>
> This is a very interesting test you've run, and certainly not expected
> results. I know of many clusters happily chugging along with millions of
> blocks, so problems at 400K are very strange. By any chance were you able to
> collect profiling information from the NameNode while running this test?
>
> That said, I hope you've set the block size to 1KB for the purpose of this
> test and not because you expect to run that in production. Recommended block
> sizes are at least 64MB and often 128MB or 256MB for larger clusters.
>
> Thanks
> -Todd
>
> On Wed, Jan 13, 2010 at 1:21 PM, <Zlatin.Balevsky@barclayscapital.com>wrote:
>
>> Greetings,
>>
>> I am testing how HDFS scales with very large number of blocks.  I did
>> the following setup:
>>
>> Set the default blocks size to 1KB
>> Started 8 insert processes, each inserting a 16MB file
>> Repeated the insert 3 times, keeping the already inserted files in HDFS
>> Repeated the entire experiment on one cluster with 4 and another with 11
>> identical datanodes (allocated through HOD)
>>
>> Results:
>> The first 128MB (2^18 blocks) insert finished in 5 minutes.  The second
>> in 12 minutes.  The third didn't finish within 1 hour.  The 11-node
>> cluster was marginally faster.
>>
>> Throughout this I was storing all available metrics.  There were no
>> signs of insufficient memory on any of the nodes; CPU usage and garbage
>> collections were constant throughout.  If anyone is interested I can
>> provide the recorded metrics.  I've attached a chart that looks clearly
>> logarithmic.
>>
>> Can anyone please point to what could be the bottleneck here?  I'm
>> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18)
>> blocks.
>>
>> Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
>> Zlatin Balevsky
>>
>> _______________________________________________
>>
>> This e-mail may contain information that is confidential, privileged or
>> otherwise protected from disclosure. If you are not an intended recipient of
>> this e-mail, do not duplicate or redistribute it by any means. Please delete
>> it and any attachments and notify the sender that you have received it in
>> error. Unless specifically indicated, this e-mail is not an offer to buy or
>> sell or a solicitation to buy or sell any securities, investment products or
>> other financial product or service, an official confirmation of any
>> transaction, or an official statement of Barclays. Any views or opinions
>> presented are solely those of the author and do not necessarily represent
>> those of Barclays. This e-mail is subject to terms available at the
>> following link: www.barcap.com/emaildisclaimer. By messaging with
>> Barclays you consent to the foregoing.  Barclays Capital is the investment
>> banking division of Barclays Bank PLC, a company registered in England
>> (number 1026167) with its registered office at 1 Churchill Place, London,
>> E14 5HP.  This email may relate to or be sent from other members of the
>> Barclays Group.
>> _______________________________________________
>>
>
>

Mime
View raw message