hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Exponential performance decay when inserting large number of blocks
Date Wed, 13 Jan 2010 21:31:53 GMT
Hi Zlatin,

This is a very interesting test you've run, and certainly not expected
results. I know of many clusters happily chugging along with millions of
blocks, so problems at 400K are very strange. By any chance were you able to
collect profiling information from the NameNode while running this test?

That said, I hope you've set the block size to 1KB for the purpose of this
test and not because you expect to run that in production. Recommended block
sizes are at least 64MB and often 128MB or 256MB for larger clusters.

Thanks
-Todd

On Wed, Jan 13, 2010 at 1:21 PM, <Zlatin.Balevsky@barclayscapital.com>wrote:

> Greetings,
>
> I am testing how HDFS scales with very large number of blocks.  I did
> the following setup:
>
> Set the default blocks size to 1KB
> Started 8 insert processes, each inserting a 16MB file
> Repeated the insert 3 times, keeping the already inserted files in HDFS
> Repeated the entire experiment on one cluster with 4 and another with 11
> identical datanodes (allocated through HOD)
>
> Results:
> The first 128MB (2^18 blocks) insert finished in 5 minutes.  The second
> in 12 minutes.  The third didn't finish within 1 hour.  The 11-node
> cluster was marginally faster.
>
> Throughout this I was storing all available metrics.  There were no
> signs of insufficient memory on any of the nodes; CPU usage and garbage
> collections were constant throughout.  If anyone is interested I can
> provide the recorded metrics.  I've attached a chart that looks clearly
> logarithmic.
>
> Can anyone please point to what could be the bottleneck here?  I'm
> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18)
> blocks.
>
> Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
> Zlatin Balevsky
>
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>  This email may relate to or be sent from other members of the Barclays
> Group.
> _______________________________________________
>

Mime
View raw message