Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 1203 invoked from network); 13 Jan 2010 22:19:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Jan 2010 22:19:43 -0000 Received: (qmail 55738 invoked by uid 500); 13 Jan 2010 22:19:42 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 55657 invoked by uid 500); 13 Jan 2010 22:19:42 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 55648 invoked by uid 99); 13 Jan 2010 22:19:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jan 2010 22:19:42 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.160.50] (HELO mail-pw0-f50.google.com) (209.85.160.50) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jan 2010 22:19:32 +0000 Received: by pwi20 with SMTP id 20so3472318pwi.29 for ; Wed, 13 Jan 2010 14:19:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.142.151.31 with SMTP id y31mr1795742wfd.107.1263421150101; Wed, 13 Jan 2010 14:19:10 -0800 (PST) In-Reply-To: <0415D0186561FD448D99C301257B70910120BA47@NYKPCMEU304VEUA.INTRANET.BARCAPINT.COM> References: <0415D0186561FD448D99C301257B70910120BA46@NYKPCMEU304VEUA.INTRANET.BARCAPINT.COM> <45f85f71001131331i9ac9767nc401ad3d66a628f1@mail.gmail.com> <45f85f71001131334p24540cbdq6d7ac30c31c33996@mail.gmail.com> <0415D0186561FD448D99C301257B70910120BA47@NYKPCMEU304VEUA.INTRANET.BARCAPINT.COM> From: Todd Lipcon Date: Wed, 13 Jan 2010 14:18:50 -0800 Message-ID: <45f85f71001131418s4611cd65k859c100378e43f42@mail.gmail.com> Subject: Re: Exponential performance decay when inserting large number of blocks To: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd3113ada8331047d132875 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd3113ada8331047d132875 Content-Type: text/plain; charset=ISO-8859-1 Hey Zlatin, Thanks for the explanation and the additional data. I'm a bit busy today but will try to go through the data and reproduce the results later this week. -Todd On Wed, Jan 13, 2010 at 2:07 PM, wrote: > Todd, > > I used a shell script that launched 8 instances of the bin/hadoop fs -put > utility. After all 8 processes were done and I verified though the web ui > that the files were inserted, I re-launched the script manually again. That > is why you'll notice that in the metrics there are two short periods without > any activity (I edited those out from the graph). There were occasional > NotReplicatedYet exceptions in the logs of those processes, but they were > occurring at constant rate. > > I did not run a profiler, but that will eventually be the next step. I'm > attaching the metrics from the namenode and one of the datanodes from the > experiment with 4 datanodes. They were recorded every 10 seconds. Heap > size for all processes is 2GB, and while there was occasional CPU usage on > the Namenode it was never 100%. (and there are plenty of cores). > > Ultimately the block size will be much larger than the default as the total > data will be in the 2^(well over 50) range. With this test I am trying to > determine if there are any bottlenecks at the NameNode component. > > Best Regards, > Zlatin Balevsky > > ------------------------------ > *From:* Todd Lipcon [mailto:todd@cloudera.com] > *Sent:* Wednesday, January 13, 2010 4:34 PM > *To:* hdfs-user@hadoop.apache.org > *Subject:* Re: Exponential performance decay when inserting large number > of blocks > > Also, if you have the program you used to do the insertions, and could > attach it, I'd be interested in trying to replicate this on a test cluster. > If you can't redistribute it, I can start from scratch, but would be easier > to run yours. > > Thanks > -Todd > > On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon wrote: > >> Hi Zlatin, >> >> This is a very interesting test you've run, and certainly not expected >> results. I know of many clusters happily chugging along with millions of >> blocks, so problems at 400K are very strange. By any chance were you able to >> collect profiling information from the NameNode while running this test? >> >> That said, I hope you've set the block size to 1KB for the purpose of this >> test and not because you expect to run that in production. Recommended block >> sizes are at least 64MB and often 128MB or 256MB for larger clusters. >> >> Thanks >> -Todd >> >> On Wed, Jan 13, 2010 at 1:21 PM, wrote: >> >>> Greetings, >>> >>> I am testing how HDFS scales with very large number of blocks. I did >>> the following setup: >>> >>> Set the default blocks size to 1KB >>> Started 8 insert processes, each inserting a 16MB file >>> Repeated the insert 3 times, keeping the already inserted files in HDFS >>> Repeated the entire experiment on one cluster with 4 and another with 11 >>> identical datanodes (allocated through HOD) >>> >>> Results: >>> The first 128MB (2^18 blocks) insert finished in 5 minutes. The second >>> in 12 minutes. The third didn't finish within 1 hour. The 11-node >>> cluster was marginally faster. >>> >>> Throughout this I was storing all available metrics. There were no >>> signs of insufficient memory on any of the nodes; CPU usage and garbage >>> collections were constant throughout. If anyone is interested I can >>> provide the recorded metrics. I've attached a chart that looks clearly >>> logarithmic. >>> >>> Can anyone please point to what could be the bottleneck here? I'm >>> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18) >>> blocks. >>> >>> Bes <> t Regards, >>> Zlatin Balevsky >>> >>> _______________________________________________ >>> >>> This e-mail may contain information that is confidential, privileged or >>> otherwise protected from disclosure. If you are not an intended recipient of >>> this e-mail, do not duplicate or redistribute it by any means. Please delete >>> it and any attachments and notify the sender that you have received it in >>> error. Unless specifically indicated, this e-mail is not an offer to buy or >>> sell or a solicitation to buy or sell any securities, investment products or >>> other financial product or service, an official confirmation of any >>> transaction, or an official statement of Barclays. Any views or opinions >>> presented are solely those of the author and do not necessarily represent >>> those of Barclays. This e-mail is subject to terms available at the >>> following link: www.barcap.com/emaildisclaimer. By messaging with >>> Barclays you consent to the foregoing. Barclays Capital is the investment >>> banking division of Barclays Bank PLC, a company registered in England >>> (number 1026167) with its registered office at 1 Churchill Place, London, >>> E14 5HP. This email may relate to or be sent from other members of the >>> Barclays Group. >>> _______________________________________________ >>> >> >> > _______________________________________________ > > > > This e-mail may contain information that is confidential, privileged or > otherwise protected from disclosure. If you are not an intended recipient of > this e-mail, do not duplicate or redistribute it by any means. Please delete > it and any attachments and notify the sender that you have received it in > error. Unless specifically indicated, this e-mail is not an offer to buy or > sell or a solicitation to buy or sell any securities, investment products or > other financial product or service, an official confirmation of any > transaction, or an official statement of Barclays. Any views or opinions > presented are solely those of the author and do not necessarily represent > those of Barclays. This e-mail is subject to terms available at the > following link: www.barcap.com/emaildisclaimer. By messaging with Barclays > you consent to the foregoing. Barclays Capital is the investment banking > division of Barclays Bank PLC, a company registered in England (number > 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. > This email may relate to or be sent from other members of the Barclays > Group.** > > _______________________________________________ > --000e0cd3113ada8331047d132875 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey Zlatin,

Thanks for the explanation and the additiona= l data. I'm a bit busy today but will try to go through the data and re= produce the results later this week.

-Todd

On Wed, Jan 13, 2010 at 2:07 PM, <Zlatin.Balev= sky@barclayscapital.com> wrote:
Todd,
=A0
I used a shell script that launched 8 instances of the=20 bin/hadoop fs -put utility.=A0 After all 8 processes were done and I verifi= ed=20 though the web ui that the files were inserted, I re-launched the script=20 manually again.=A0 That is why you'll notice that in the metrics=20 there=A0are two short periods without any activity (I edited=A0those out=20 from the graph).=A0 There were occasional NotReplicatedYet exceptions in th= e=20 logs of those processes, but they were occurring at constant=20 rate.
=A0
I did not run a profiler, but that will eventually be the=20 next step.=A0 I'm attaching the metrics from the namenode and one of th= e=20 datanodes from the experiment with 4 datanodes.=A0 They were recorded every= =20 10 seconds.=A0 Heap size for all processes is 2GB, and while there was=20 occasional CPU usage on the Namenode it was never 100%.=A0 (and there are= =20 plenty of cores).
=A0
Ultimately the block size will be much larger than the=20 default as the total data will be in the 2^(well over 50) range.=A0 With th= is=20 test I am trying to determine if there are any bottlenecks at the NameNode= =A0=20 component.
=A0
Best=20 Regards,
Zlatin=20 Balevsky
=A0

From: Todd Lipcon [mailto:todd@cloudera.com]=20
Sent: Wednesday, January 13, 2010 4:34 PM
To:=20 hdfs-user@= hadoop.apache.org
Subject: Re: Exponential performance decay= =20 when inserting large number of blocks

<= div class=3D"h5">
Also, if you have the program you used to do the insertions, and= =20 could attach it, I'd be interested in trying to replicate this on a tes= t=20 cluster. If you can't redistribute it, I can start from scratch, but wo= uld be=20 easier to run yours.

Thanks
-Todd

On Wed, Jan 13, 2010 at 1:31 PM, Todd Lipcon <t= odd@cloudera.com>=20 wrote:
Hi=20 Zlatin,

This is a very interesting test you've run, and certainly not ex= pected=20 results. I know of many clusters happily chugging along with millions of= =20 blocks, so problems at 400K are very strange. By any chance were you able= to=20 collect profiling information from the NameNode while running this test?<= /div>

That said, I hope you've set the block size to 1KB for the purpo= se of=20 this test and not because you expect to run that in production. Recommend= ed=20 block sizes are at least 64MB and often 128MB or 256MB for larger=20 clusters.

Thanks
-Todd

On Wed, Jan 13, 2010 at 1:21 PM, <Zlatin.Balevsky@barclayscapital.com> wrote:
Greetings,

I=20 am testing how HDFS scales with very large number of blocks. =A0I=20 did
the following setup:

Set the default blocks size to=20 1KB
Started 8 insert processes, each inserting a 16MB file
Repeat= ed=20 the insert 3 times, keeping the already inserted files in HDFS
Repea= ted=20 the entire experiment on one cluster with 4 and another with 11
iden= tical=20 datanodes (allocated through HOD)

Results:
The first 128MB (2= ^18=20 blocks) insert finished in 5 minutes. =A0The second
in 12 minutes.= =20 =A0The third didn't finish within 1 hour. =A0The 11-node
cluster= =20 was marginally faster.

Throughout this I was storing all availab= le=20 metrics. =A0There were no
signs of insufficient memory on any of the= =20 nodes; CPU usage and garbage
collections were constant throughout.= =20 =A0If anyone is interested I can
provide the recorded metrics.=20 =A0I've attached a chart that looks clearly
logarithmic.

= Can=20 anyone please point to what could be the bottleneck here?=20 =A0I'm
evaluating HDFS for usage scenarios requiring 2^(a lot mo= re=20 than 18)
blocks.

Bes=20 <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
Zla= tin=20 Balevsky

_______________________________________________

= This=20 e-mail may contain information that is confidential, privileged or othe= rwise=20 protected from disclosure. If you are not an intended recipient of this= =20 e-mail, do not duplicate or redistribute it by any means. Please delete= it=20 and any attachments and notify the sender that you have received it in= =20 error. Unless specifically indicated, this e-mail is not an offer to bu= y or=20 sell or a solicitation to buy or sell any securities, investment produc= ts or=20 other financial product or service, an official confirmation of any=20 transaction, or an official statement of Barclays. Any views or opinion= s=20 presented are solely those of the author and do not necessarily represe= nt=20 those of Barclays. This e-mail is subject to terms available at the=20 following link: www.barcap.com/emaildisclaimer. By messaging with Barclays= =20 you consent to the foregoing. =A0Barclays Capital is the investment=20 banking division of Barclays Bank PLC, a company registered in England= =20 (number 1026167) with its registered office at 1 Churchill Place, Londo= n,=20 E14 5HP. =A0This email may relate to or be sent from other members of t= he=20 Barclays=20 Group.
_______________________________________________


____________________________________= ___________

=A0

This e-mail may contain=20 information that is confidential, privileged or otherwise protected from=20 disclosure. If you are not an intended recipient of this e-mail, do not=20 duplicate or redistribute it by any means. Please delete it and any attachm= ents=20 and notify the sender that you have received it in error. Unless specifical= ly=20 indicated, this e-mail is not an offer to buy or sell or a solicitation to = buy=20 or sell any securities, investment products or other financial product or= =20 service, an official confirmation of any transaction, or an official statem= ent=20 of Barclays. Any views or opinions presented are solely those of the author= and=20 do not necessarily represent those of Barclays. This e-mail is subject to t= erms=20 available at the following link: www.barcap.com/emaildisclaimer.=20 By mes= saging=20 with Barclays you consent to the foregoing.=A0 Barclays Capita= l is the investment=20 banking division of Barclays Bank PLC, a company registered in England=20 (number 1026167) with its registered office at 1 Churchill Place, London, E= 14=20 5HP.=A0=20 This email may relate to or be sent from other members of the Barcla= ys=20 Group.

____________________________________= ___________


--000e0cd3113ada8331047d132875--