hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Zlatin.Balev...@barclayscapital.com>
Subject Exponential performance decay when inserting large number of blocks
Date Wed, 13 Jan 2010 21:21:23 GMT

I am testing how HDFS scales with very large number of blocks.  I did
the following setup:

Set the default blocks size to 1KB 
Started 8 insert processes, each inserting a 16MB file
Repeated the insert 3 times, keeping the already inserted files in HDFS
Repeated the entire experiment on one cluster with 4 and another with 11
identical datanodes (allocated through HOD)

The first 128MB (2^18 blocks) insert finished in 5 minutes.  The second
in 12 minutes.  The third didn't finish within 1 hour.  The 11-node
cluster was marginally faster.

Throughout this I was storing all available metrics.  There were no
signs of insufficient memory on any of the nodes; CPU usage and garbage
collections were constant throughout.  If anyone is interested I can
provide the recorded metrics.  I've attached a chart that looks clearly

Can anyone please point to what could be the bottleneck here?  I'm
evaluating HDFS for usage scenarios requiring 2^(a lot more than 18)

Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
Zlatin Balevsky 


This e-mail may contain information that is confidential, privileged or otherwise protected
from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or
redistribute it by any means. Please delete it and any attachments and notify the sender that
you have received it in error. Unless specifically indicated, this e-mail is not an offer
to buy or sell or a solicitation to buy or sell any securities, investment products or other
financial product or service, an official confirmation of any transaction, or an official
statement of Barclays. Any views or opinions presented are solely those of the author and
do not necessarily represent those of Barclays. This e-mail is subject to terms available
at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC,
a company registered in England (number 1026167) with its registered office at 1 Churchill
Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays

View raw message