Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 75130 invoked from network); 28 Feb 2011 05:02:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Feb 2011 05:02:12 -0000 Received: (qmail 98950 invoked by uid 500); 28 Feb 2011 05:02:11 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 98568 invoked by uid 500); 28 Feb 2011 05:02:08 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 98560 invoked by uid 99); 28 Feb 2011 05:02:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 05:02:07 +0000 X-ASF-Spam-Status: No, hits=0.8 required=5.0 tests=FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [65.115.85.73] (HELO smtp-outbound-2.vmware.com) (65.115.85.73) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 05:01:59 +0000 Received: from mailhost3.vmware.com (mailhost3.vmware.com [10.16.27.45]) by smtp-outbound-2.vmware.com (Postfix) with ESMTP id 019C637003 for ; Sun, 27 Feb 2011 21:01:38 -0800 (PST) Received: from pa-exht03.vmware.com (pa-exht03.vmware.com [10.16.45.223]) by mailhost3.vmware.com (Postfix) with ESMTP id EFF49CD9A6 for ; Sun, 27 Feb 2011 21:01:37 -0800 (PST) Received: from EXCH-MBX-5.vmware.com ([10.113.81.237]) by pa-exht03.vmware.com ([10.16.45.223]) with mapi; Sun, 27 Feb 2011 21:01:37 -0800 From: Jeffrey Buell To: "hdfs-user@hadoop.apache.org" Date: Sun, 27 Feb 2011 21:01:37 -0800 Subject: RE: balancing and replication in HDFS Thread-Topic: balancing and replication in HDFS Thread-Index: AcvVavgNYLQ5VIekRnazYK4w9sjacQBmSlPA Message-ID: <2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C69@EXCH-MBX-5.vmware.com> References: <2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C66@EXCH-MBX-5.vmware.com> <2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C68@EXCH-MBX-5.vmware.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C69EXCHMBX5vmwar_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C69EXCHMBX5vmwar_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Todd, Much better. I also had to adjust the number of map tasks for the gen step= . Storage was spread across 3 nodes instead of 4, but a bit more playing w= ith these parameters should do the trick. Thanks for your help. Jeff From: Todd Lipcon [mailto:todd@cloudera.com] Sent: Friday, February 25, 2011 8:09 PM To: hdfs-user@hadoop.apache.org Cc: Jeffrey Buell Subject: Re: balancing and replication in HDFS When you run terasort, pass -Dmapred.reduce.tasks=3D4 and see how that goes= for you. See this old thread for info: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3C= cbbf4b570906300617ma4505f5o2aa1b9fb87b31868@mail.gmail.com%3E -Todd On Fri, Feb 25, 2011 at 4:45 PM, Jeffrey Buell > wrote: Hi Todd, Thanks for the quick response. I added true to dfs.replication, but I still get just one ou= tput copy. Can hadoop apps overwrite the replication level even with this = parameter? I tried increasing mapred.tasktracker.reduce.tasks.maximum from 1 to 4, but= that didn't make any difference: output is still all on one node. I thou= ght that parameter controls the number of reduce tasks per node so 1 should= be sufficient, is that correct logic? The other reduce parameters are at = their defaults. Which parameters should I be trying? There's no way the sort can run efficiently if the gen is unbalanced. At 5= GB, there're plenty of blocks to make the distribution even. How does HDF= S decide to spread the storage across the nodes? Note the sizes of the 4 c= hunks (1.7,...,3.2 GB) seem to be repeatable, but they land on different no= des each time teragen is run. Jeff > -----Original Message----- > From: Todd Lipcon [mailto:todd@cloudera.com] > Sent: Friday, February 25, 2011 3:13 PM > To: hdfs-user@hadoop.apache.org > Subject: Re: balancing and replication in HDFS > > Hi Jeff, > The output of terasort has replication level 1 by default. This is so > it goes faster with the default settings and makes for more impressive > benchmark results :) > The reason you see it all on one machine is probably that you're > running with one reducer. Try configuring your terasort to use more > reduce tasks and you should see the load (and space usage) even out. > -Todd > > On Fri, Feb 25, 2011 at 2:52 PM, Jeffrey Buell > > wrote: > > > > I'm a newbie to hadoop and HDFS. I'm seeing odd behavior in HDFS > that I hope somebody can clear up for me. I'm running hadoop version > 0.20.1+169.127 from the cloudera distro on 4 identical nodes, each with > 4 cpus and 100GB disk space. Replication is set to 2. > > > > I run: > > > > hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar teragen 50000000 > tera_in5 > > > > This produces the expected 10GB of data on disk (5GB * 2 copies). > But the data is spread very unevenly across the nodes, ranging from > 1.7 to 3.2 GB on each node. Then I sort the data: > > > > hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar terasort tera_in5 > tera_out5 > > > > It finishes successfully, and HDFS recognizes the right amount of > data: > > > > $ hadoop fs -du /user/hadoop/ > > Found 2 items > > 5000023410 hdfs://namd-1/user/hadoop/tera_in5 > > 5000170993 hdfs://namd-1/user/hadoop/tera_out5 > > > > However all the new data is on one node (apparently randomly chosen), > and the total disk usage is only 15GB, which means that the output data > is not replicated. For nearly all the elapsed time of the sort, the > other 3 nodes are idle. Some of the output data is in > dfs/data/current, but a lot is in one of 64 new subdirs > (dfs/data/current/subdir0 through subdir63). > > > > Why is all this happening? Am I missing some tunables that make HDFS > do the right balance and replication? > > > > Thanks, > > > > Jeff > > > > -- > Todd Lipcon > Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera --_000_2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C69EXCHMBX5vmwar_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Todd,

 

<= p class=3DMsoNormal>Much better.  I also had to adjust the numb= er of map tasks for the gen step.  Storage was spread across 3 nodes i= nstead of 4, but a bit more playing with these parameters should do the tri= ck.

 

Thanks for your help.

 

Jeff

 = ;

From: Todd Lipco= n [mailto:todd@cloudera.com]
Sent: Friday, February 25, 2011 8:0= 9 PM
To: hdfs-user@hadoop.apache.org
Cc: Jeffrey Buell<= br>Subject: Re: balancing and replication in HDFS<= /p>

 

When you run terasort, pass -Dmapred.reduce.tasks=3D4 and see how that g= oes for you. See this old thread for info:

http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.m= box/%3Ccbbf4b570906300617ma4505f5o2aa1b9fb87b31868@mail.gmail.com%3E

 

-Todd

On Fri, Feb 25, 2011 at 4:45 PM, Jeffrey Buell <jbuell@vmware.com> wrote:<= /p>

Hi Todd,

Thanks for the quick response.
<= br>I added <final>true</final> to dfs.replication, but I still = get just one output copy.  Can hadoop apps overwrite the replication l= evel even with this parameter?

I tried increasing mapred.tasktracker= .reduce.tasks.maximum from 1 to 4, but that didn't make any difference: &nb= sp;output is still all on one node.  I thought that parameter controls= the number of reduce tasks per node so 1 should be sufficient, is that cor= rect logic?  The other reduce parameters are at their defaults.  = Which parameters should I be trying?

There's no way the sort can run= efficiently if the gen is unbalanced.  At 5 GB, there're plenty of bl= ocks to make the distribution even.  How does HDFS decide to spread th= e storage across the nodes?  Note the sizes of the 4 chunks (1.7,...,3= .2 GB) seem to be repeatable, but they land on different nodes each time te= ragen is run.

Jeff


> = -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Friday, Februar= y 25, 2011 3:13 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: balancing and repli= cation in HDFS
>

&= gt; Hi Jeff,
> The output of terasort has replication level 1 by defa= ult. This is so
> it goes faster with the default settings and makes = for more impressive
> benchmark results :)
> The reason you see= it all on one machine is probably that you're
> running with one red= ucer. Try configuring your terasort to use more
> reduce tasks and yo= u should see the load (and space usage) even out.
> -Todd
>
= > On Fri, Feb 25, 2011 at 2:52 PM, Jeffrey Buell <jbuell@vmware.com>
> wrote:
> >
= > > I'm a newbie to hadoop and HDFS.  I'm seeing odd behavior in= HDFS
> that I hope somebody can clear up for me.  I'm running h= adoop version
> 0.20.1+169.127 from the cloudera distro on 4 identica= l nodes, each with
> 4 cpus and 100GB disk space.  Replication i= s set to 2.
> >
> > I run:
> >
> > hado= op jar /usr/lib/hadoop/hadoop-*-examples.jar teragen 50000000
> tera_= in5
> >
> > This produces the expected 10GB of data on di= sk (5GB * 2 copies).
>  But the data is spread very unevenly acr= oss the nodes, ranging from
> 1.7 to 3.2 GB on each node.  Then = I sort the data:
> >
> > hadoop jar /usr/lib/hadoop/hadoo= p-*-examples.jar terasort tera_in5
> tera_out5
> >
> &= gt; It finishes successfully, and HDFS recognizes the right amount of
&g= t; data:
> >
> > $ hadoop fs -du /user/hadoop/
> &g= t; Found 2 items
> > 5000023410  hdfs://namd-1/user/hadoop/te= ra_in5
> > 5000170993  hdfs://namd-1/user/hadoop/tera_out5> >
> > However all the new data is on one node (apparently= randomly chosen),
> and the total disk usage is only 15GB, which mea= ns that the output data
> is not replicated.  For nearly all the= elapsed time of the sort, the
> other 3 nodes are idle.  Some o= f the output data is in
> dfs/data/current, but a lot is in one of 64= new subdirs
> (dfs/data/current/subdir0 through subdir63).
> &= gt;
> > Why is all this happening?  Am I missing some tunable= s that make HDFS
> do the right balance and replication?
> >=
> > Thanks,
> >
> > Jeff
>
>
>= ;
> --
> Todd Lipcon
> Software Engineer, Cloudera




-- <= br>Todd Lipcon
Software Engineer, Cloudera

= --_000_2770DFCA5D5D8F4895CD0A7EE182B2F30BEFF63C69EXCHMBX5vmwar_--