Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 38842 invoked from network); 9 Jul 2010 00:04:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Jul 2010 00:04:10 -0000 Received: (qmail 2460 invoked by uid 500); 9 Jul 2010 00:04:09 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 2399 invoked by uid 500); 9 Jul 2010 00:04:09 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 2391 invoked by uid 99); 9 Jul 2010 00:04:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jul 2010 00:04:08 +0000 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.65.144.70] (HELO p01c11o147.mxlogic.net) (208.65.144.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jul 2010 00:04:00 +0000 Received: from unknown [216.166.12.178] (EHLO p01c11o147.mxlogic.net) by p01c11o147.mxlogic.net(mxl_mta-6.7.0-0) with ESMTP id b57663c4.6e1dd940.42378.00-508.105610.p01c11o147.mxlogic.net (envelope-from ); Thu, 08 Jul 2010 18:03:39 -0600 (MDT) X-MXL-Hash: 4c36675b071e97e7-ebf3a53b975ac29adf300d12c207e7c1ea23dfad Received: from unknown [216.166.12.178] by p01c11o147.mxlogic.net(mxl_mta-6.7.0-0) with SMTP id 604663c4.0.41178.00-388.102818.p01c11o147.mxlogic.net (envelope-from ); Thu, 08 Jul 2010 17:50:27 -0600 (MDT) X-MXL-Hash: 4c36644334311329-c9bccfd04022ffd988955ffe6a7c81ba5f62ba4d Received: from AUSP01VMBX08.collaborationhost.net ([10.2.8.97]) by AUSP01MHUB04.collaborationhost.net ([10.2.0.189]) with mapi; Thu, 8 Jul 2010 18:48:37 -0500 From: Arun Ramakrishnan To: "hdfs-user@hadoop.apache.org" Date: Thu, 8 Jul 2010 18:48:35 -0500 Subject: RE: rebalancing replciation help Thread-Topic: rebalancing replciation help Thread-Index: AcsezQVcPYeC6VbXSnygQTmh5XaSaAAKwowg Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_" MIME-Version: 1.0 X-Spam: [F=0.2000000000; CM=0.500; S=0.200(2010070601)] X-MAIL-FROM: X-SOURCE-IP: [216.166.12.178] X-AnalysisOut: [v=1.0 c=1 a=N0BW0H3cnSoA:10 a=mI6YO6ZdSLUA:10 a=VphdPIyG4k] X-AnalysisOut: [EA:10 a=N0rcUYRaQg3G+2kUXxmxxA==:17 a=IFTmCX7IAAAA:8 a=mV9] X-AnalysisOut: [VRH-2AAAA:8 a=JHkxOEkuAAAA:8 a=TuEI-hWked3gMIqbCMQA:9 a=xM] X-AnalysisOut: [XOTOxNyGFsfFHP7FK0i1RC2BMA:4 a=CjuIK1q_8ugA:10 a=dbG880YOA] X-AnalysisOut: [jkA:10 a=vTloqEEtyOMA:10 a=SSmOFEACAAAA:8 a=Y2VNeNrzAAAA:8] X-AnalysisOut: [ a=yMhMjlubAAAA:8 a=TW66zc2HAAAA:8 a=HQ31llbKAAAA:8 a=v2Ij] X-AnalysisOut: [4R8Diu2qPDNX0t0A:9 a=uF51g73fI2T24TljRR0A:7 a=_shf9P8Kla_l] X-AnalysisOut: [jJR6I8aYzhAmCAkA:4] X-Virus-Checked: Checked by ClamAV on apache.org --_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks Alex. From: Alex Loddengaard [mailto:alex@cloudera.com] Sent: Thursday, July 08, 2010 11:39 AM To: hdfs-user@hadoop.apache.org Subject: Re: rebalancing replciation help Hi Arun, Consider setting dfs.balance.bandwidthPerSec to something as high as 209715= 20 for the balancer and the setrep. You can do this by supplying -D at the= command line. Your strategy for getting data onto the 5 nodes is correct: balance and set= rep. Just understand these things take time. Hope this helps. Alex On Wed, Jul 7, 2010 at 4:09 PM, Arun Ramakrishnan > wrote: Hi guys. I have more than a specific question. I am going to layout the steps I ha= ve taken. Please comment on what I can do better. I was trying to to add 5 nodes to my existing 10 node cluster and also in= crease the replication factor from 2 to 3. I thought I don't have to run the balancer cause it would most likely put t= he new replicas into the new nodes. There are about 500k blocks. I wanted to get it all stabilized(replication and balancing) within 24 hour= s. Its more than 24 hours now and fsck reports 30% under replication. Is th= ere a way to force hdfs to use balance/replicate more aggressively. It would be great if someone explained what/when things happen to blocks in= the context of 1) Rebalancing 2) -setrep 3) Restarting cluster with a higher/lower replication factor. A few questions and a few issues here. 1) When you restart the cluster with a higher than previous replicatio= n value. Does it also apply to existing blocks or only to new blocks being = created ? 2) Does the balancer take into account under replication of blocks or = does it blindly start moving existing blocks to reach threshold ? A very specific problem . I am having this strange problem where the -setr= ep hangs on one particular block for hours. Is this because its corrupt ?. = But, fsck said its healthy. Thanks Arun --_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Thanks Alex.

 

From: Alex Loddenga= ard [mailto:alex@cloudera.com]
Sent: Thursday, July 08, 2010 11:39 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: rebalancing replciation help

 

Hi Arun,

 

Consider setting dfs.balance.bandwidthPerSec to something as high as 20971520 for the balancer and the setrep.  Y= ou can do this by supplying -D at the command line.

 

Your strategy for getting data onto the 5 nodes is cor= rect: balance and setrep.  Just understand these things take time.

 

Hope this helps.

 

Alex

On Wed, Jul 7, 2010 at 4:09 PM, Arun Ramakrishnan <= aramakrishnan@languageweav= er.com> wrote:

Hi guys.

  I have more than a specific question. I am going to layout the steps I have taken. Please comment on what I can do better.

 

  I was trying to to add 5 nodes to my existing 10 node cluster and also incr= ease the replication factor from 2 to 3.

I thought I don’t have to run the balancer cause it would most likely p= ut the new replicas into the new nodes.

 

There are about 500k blocks.

I wanted to get it all stabilized(replication and balancing) within 24 hours.= Its more than 24 hours now and fsck reports 30% under replication. Is there a w= ay to force hdfs to use balance/replicate more aggressively.

 

It would be great if someone explained what/when things happen to blocks in th= e context of

1) &nb= sp;    Rebalancing

2) &nb= sp;    –setrep

3) &nb= sp;    Restarting cluster with a higher/lower replication factor.

 

A few questions and a few issues here.

1)      = When you restart the cluster with a higher than previous replication value. Does= it also apply to existing blocks or only to new blocks being created ?

2)      = Does the balancer take into account under replication of blocks or does it blind= ly start moving existing blocks to reach threshold ?

 

 

A very specific problem .  I am having this strange problem where the –setrep hangs on one particular block for hours. Is this because its = corrupt ?. But, fsck said its healthy.

 

 

Thanks

Arun

 

--_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_--