hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Zlatin.Balev...@barclayscapital.com>
Subject RE: Exponential performance decay when inserting large number of blocks
Date Wed, 13 Jan 2010 22:07:15 GMT
Todd,
 
I used a shell script that launched 8 instances of the bin/hadoop fs
-put utility.  After all 8 processes were done and I verified though the
web ui that the files were inserted, I re-launched the script manually
again.  That is why you'll notice that in the metrics there are two
short periods without any activity (I edited those out from the graph).
There were occasional NotReplicatedYet exceptions in the logs of those
processes, but they were occurring at constant rate.
 
I did not run a profiler, but that will eventually be the next step.
I'm attaching the metrics from the namenode and one of the datanodes
from the experiment with 4 datanodes.  They were recorded every 10
seconds.  Heap size for all processes is 2GB, and while there was
occasional CPU usage on the Namenode it was never 100%.  (and there are
plenty of cores).
 
Ultimately the block size will be much larger than the default as the
total data will be in the 2^(well over 50) range.  With this test I am
trying to determine if there are any bottlenecks at the NameNode
component.
 
Best Regards,
Zlatin Balevsky
 
________________________________

From: Todd Lipcon [mailto:todd@cloudera.com] 
Sent: Wednesday, January 13, 2010 4:34 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Exponential performance decay when inserting large number
of blocks


Also, if you have the program you used to do the insertions, and could
attach it, I'd be interested in trying to replicate this on a test
cluster. If you can't redistribute it, I can start from scratch, but
would be easier to run yours. 

Thanks
-Todd


On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon <todd@cloudera.com> wrote:


	Hi Zlatin, 

	This is a very interesting test you've run, and certainly not
expected results. I know of many clusters happily chugging along with
millions of blocks, so problems at 400K are very strange. By any chance
were you able to collect profiling information from the NameNode while
running this test?

	That said, I hope you've set the block size to 1KB for the
purpose of this test and not because you expect to run that in
production. Recommended block sizes are at least 64MB and often 128MB or
256MB for larger clusters.

	Thanks
	-Todd

	On Wed, Jan 13, 2010 at 1:21 PM, <
Zlatin.Balevsky@barclayscapital.com> wrote:
	

		Greetings,
		
		I am testing how HDFS scales with very large number of
blocks.  I did
		the following setup:
		
		Set the default blocks size to 1KB
		Started 8 insert processes, each inserting a 16MB file
		Repeated the insert 3 times, keeping the already
inserted files in HDFS
		Repeated the entire experiment on one cluster with 4 and
another with 11
		identical datanodes (allocated through HOD)
		
		Results:
		The first 128MB (2^18 blocks) insert finished in 5
minutes.  The second
		in 12 minutes.  The third didn't finish within 1 hour.
The 11-node
		cluster was marginally faster.
		
		Throughout this I was storing all available metrics.
There were no
		signs of insufficient memory on any of the nodes; CPU
usage and garbage
		collections were constant throughout.  If anyone is
interested I can
		provide the recorded metrics.  I've attached a chart
that looks clearly
		logarithmic.
		
		Can anyone please point to what could be the bottleneck
here?  I'm
		evaluating HDFS for usage scenarios requiring 2^(a lot
more than 18)
		blocks.
		
		Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
		Zlatin Balevsky
		
		_______________________________________________
		
		This e-mail may contain information that is
confidential, privileged or otherwise protected from disclosure. If you
are not an intended recipient of this e-mail, do not duplicate or
redistribute it by any means. Please delete it and any attachments and
notify the sender that you have received it in error. Unless
specifically indicated, this e-mail is not an offer to buy or sell or a
solicitation to buy or sell any securities, investment products or other
financial product or service, an official confirmation of any
transaction, or an official statement of Barclays. Any views or opinions
presented are solely those of the author and do not necessarily
represent those of Barclays. This e-mail is subject to terms available
at the following link: www.barcap.com/emaildisclaimer. By messaging with
Barclays you consent to the foregoing.  Barclays Capital is the
investment banking division of Barclays Bank PLC, a company registered
in England (number 1026167) with its registered office at 1 Churchill
Place, London, E14 5HP.  This email may relate to or be sent from other
members of the Barclays Group.
		_______________________________________________
		




_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected
from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or
redistribute it by any means. Please delete it and any attachments and notify the sender that
you have received it in error. Unless specifically indicated, this e-mail is not an offer
to buy or sell or a solicitation to buy or sell any securities, investment products or other
financial product or service, an official confirmation of any transaction, or an official
statement of Barclays. Any views or opinions presented are solely those of the author and
do not necessarily represent those of Barclays. This e-mail is subject to terms available
at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC,
a company registered in England (number 1026167) with its registered office at 1 Churchill
Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays
Group.
_______________________________________________

Mime
View raw message