hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Exponential performance decay when inserting large number of blocks
Date Wed, 13 Jan 2010 22:18:50 GMT
Hey Zlatin,

Thanks for the explanation and the additional data. I'm a bit busy today but
will try to go through the data and reproduce the results later this week.

-Todd

On Wed, Jan 13, 2010 at 2:07 PM, <Zlatin.Balevsky@barclayscapital.com>wrote:

>  Todd,
>
> I used a shell script that launched 8 instances of the bin/hadoop fs -put
> utility.  After all 8 processes were done and I verified though the web ui
> that the files were inserted, I re-launched the script manually again.  That
> is why you'll notice that in the metrics there are two short periods without
> any activity (I edited those out from the graph).  There were occasional
> NotReplicatedYet exceptions in the logs of those processes, but they were
> occurring at constant rate.
>
> I did not run a profiler, but that will eventually be the next step.  I'm
> attaching the metrics from the namenode and one of the datanodes from the
> experiment with 4 datanodes.  They were recorded every 10 seconds.  Heap
> size for all processes is 2GB, and while there was occasional CPU usage on
> the Namenode it was never 100%.  (and there are plenty of cores).
>
> Ultimately the block size will be much larger than the default as the total
> data will be in the 2^(well over 50) range.  With this test I am trying to
> determine if there are any bottlenecks at the NameNode  component.
>
> Best Regards,
> Zlatin Balevsky
>
>  ------------------------------
> *From:* Todd Lipcon [mailto:todd@cloudera.com]
> *Sent:* Wednesday, January 13, 2010 4:34 PM
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Re: Exponential performance decay when inserting large number
> of blocks
>
> Also, if you have the program you used to do the insertions, and could
> attach it, I'd be interested in trying to replicate this on a test cluster.
> If you can't redistribute it, I can start from scratch, but would be easier
> to run yours.
>
> Thanks
> -Todd
>
> On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hi Zlatin,
>>
>> This is a very interesting test you've run, and certainly not expected
>> results. I know of many clusters happily chugging along with millions of
>> blocks, so problems at 400K are very strange. By any chance were you able to
>> collect profiling information from the NameNode while running this test?
>>
>> That said, I hope you've set the block size to 1KB for the purpose of this
>> test and not because you expect to run that in production. Recommended block
>> sizes are at least 64MB and often 128MB or 256MB for larger clusters.
>>
>> Thanks
>> -Todd
>>
>> On Wed, Jan 13, 2010 at 1:21 PM, <Zlatin.Balevsky@barclayscapital.com>wrote:
>>
>>> Greetings,
>>>
>>> I am testing how HDFS scales with very large number of blocks.  I did
>>> the following setup:
>>>
>>> Set the default blocks size to 1KB
>>> Started 8 insert processes, each inserting a 16MB file
>>> Repeated the insert 3 times, keeping the already inserted files in HDFS
>>> Repeated the entire experiment on one cluster with 4 and another with 11
>>> identical datanodes (allocated through HOD)
>>>
>>> Results:
>>> The first 128MB (2^18 blocks) insert finished in 5 minutes.  The second
>>> in 12 minutes.  The third didn't finish within 1 hour.  The 11-node
>>> cluster was marginally faster.
>>>
>>> Throughout this I was storing all available metrics.  There were no
>>> signs of insufficient memory on any of the nodes; CPU usage and garbage
>>> collections were constant throughout.  If anyone is interested I can
>>> provide the recorded metrics.  I've attached a chart that looks clearly
>>> logarithmic.
>>>
>>> Can anyone please point to what could be the bottleneck here?  I'm
>>> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18)
>>> blocks.
>>>
>>> Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
>>> Zlatin Balevsky
>>>
>>> _______________________________________________
>>>
>>> This e-mail may contain information that is confidential, privileged or
>>> otherwise protected from disclosure. If you are not an intended recipient of
>>> this e-mail, do not duplicate or redistribute it by any means. Please delete
>>> it and any attachments and notify the sender that you have received it in
>>> error. Unless specifically indicated, this e-mail is not an offer to buy or
>>> sell or a solicitation to buy or sell any securities, investment products or
>>> other financial product or service, an official confirmation of any
>>> transaction, or an official statement of Barclays. Any views or opinions
>>> presented are solely those of the author and do not necessarily represent
>>> those of Barclays. This e-mail is subject to terms available at the
>>> following link: www.barcap.com/emaildisclaimer. By messaging with
>>> Barclays you consent to the foregoing.  Barclays Capital is the investment
>>> banking division of Barclays Bank PLC, a company registered in England
>>> (number 1026167) with its registered office at 1 Churchill Place, London,
>>> E14 5HP.  This email may relate to or be sent from other members of the
>>> Barclays Group.
>>> _______________________________________________
>>>
>>
>>
>  _______________________________________________
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.**
>
> _______________________________________________
>

Mime
View raw message