Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Arun Ramakrishnan <aramakrishnan@languageweaver.com>
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
Date: Thu, 8 Jul 2010 18:48:35 -0500
Subject: RE: rebalancing replciation help
Thread-Topic: rebalancing replciation help
Thread-Index: AcsezQVcPYeC6VbXSnygQTmh5XaSaAAKwowg
Message-ID: 
 <C3AD6464AC81DC4AB14FFEA31391866A7932250406@AUSP01VMBX08.collaborationhost.net>
References: 
 <C3AD6464AC81DC4AB14FFEA31391866A7932140961@AUSP01VMBX08.collaborationhost.net>
 <AANLkTikt2VAXqsIFa1strf1QWryDJzZ8Rcyw1oQbb-t9@mail.gmail.com>
In-Reply-To: <AANLkTikt2VAXqsIFa1strf1QWryDJzZ8Rcyw1oQbb-t9@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_"
MIME-Version: 1.0

--_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Thanks Alex.

From: Alex Loddengaard [mailto:alex@cloudera.com]
Sent: Thursday, July 08, 2010 11:39 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: rebalancing replciation help

Hi Arun,

Consider setting dfs.balance.bandwidthPerSec to something as high as 209715=
20 for the balancer and the setrep.  You can do this by supplying -D at the=
 command line.

Your strategy for getting data onto the 5 nodes is correct: balance and set=
rep.  Just understand these things take time.

Hope this helps.

Alex
On Wed, Jul 7, 2010 at 4:09 PM, Arun Ramakrishnan <aramakrishnan@languagewe=
aver.com<mailto:aramakrishnan@languageweaver.com>> wrote:
Hi guys.
  I have more than a specific question. I am going to layout the steps I ha=
ve taken. Please comment on what I can do better.

  I was trying to to add 5 nodes to my existing 10 node cluster and also in=
crease the replication factor from 2 to 3.
I thought I don't have to run the balancer cause it would most likely put t=
he new replicas into the new nodes.

There are about 500k blocks.
I wanted to get it all stabilized(replication and balancing) within 24 hour=
s. Its more than 24 hours now and fsck reports 30% under replication. Is th=
ere a way to force hdfs to use balance/replicate more aggressively.

It would be great if someone explained what/when things happen to blocks in=
 the context of

1)      Rebalancing

2)      -setrep

3)      Restarting cluster with a higher/lower replication factor.

A few questions and a few issues here.

1)      When you restart the cluster with a higher than previous replicatio=
n value. Does it also apply to existing blocks or only to new blocks being =
created ?

2)      Does the balancer take into account under replication of blocks or =
does it blindly start moving existing blocks to reach threshold ?


A very specific problem .  I am having this strange problem where the -setr=
ep hangs on one particular block for hours. Is this because its corrupt ?. =
But, fsck said its healthy.


Thanks
Arun


--_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:x=3D"urn:schemas-microsoft-com:office:excel" xmlns:p=3D"urn:schemas-m=
icrosoft-com:office:powerpoint" xmlns:a=3D"urn:schemas-microsoft-com:office=
:access" xmlns:dt=3D"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s=3D"=
uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs=3D"urn:schemas-microsof=
t-com:rowset" xmlns:z=3D"#RowsetSchema" xmlns:b=3D"urn:schemas-microsoft-co=
m:office:publisher" xmlns:ss=3D"urn:schemas-microsoft-com:office:spreadshee=
t" xmlns:c=3D"urn:schemas-microsoft-com:office:component:spreadsheet" xmlns=
:odc=3D"urn:schemas-microsoft-com:office:odc" xmlns:oa=3D"urn:schemas-micro=
soft-com:office:activation" xmlns:html=3D"http://www.w3.org/TR/REC-html40" =
xmlns:q=3D"http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc=3D"http://m=
icrosoft.com/officenet/conferencing" xmlns:D=3D"DAV:" xmlns:Repl=3D"http://=
schemas.microsoft.com/repl/" xmlns:mt=3D"http://schemas.microsoft.com/share=
point/soap/meetings/" xmlns:x2=3D"http://schemas.microsoft.com/office/excel=
/2003/xml" xmlns:ppda=3D"http://www.passport.com/NameSpace.xsd" xmlns:ois=
=3D"http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir=3D"http://=
schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds=3D"http://www.w3=
.org/2000/09/xmldsig#" xmlns:dsp=3D"http://schemas.microsoft.com/sharepoint=
/dsp" xmlns:udc=3D"http://schemas.microsoft.com/data/udc" xmlns:xsd=3D"http=
://www.w3.org/2001/XMLSchema" xmlns:sub=3D"http://schemas.microsoft.com/sha=
repoint/soap/2002/1/alerts/" xmlns:ec=3D"http://www.w3.org/2001/04/xmlenc#"=
 xmlns:sp=3D"http://schemas.microsoft.com/sharepoint/" xmlns:sps=3D"http://=
schemas.microsoft.com/sharepoint/soap/" xmlns:xsi=3D"http://www.w3.org/2001=
/XMLSchema-instance" xmlns:udcs=3D"http://schemas.microsoft.com/data/udc/so=
ap" xmlns:udcxf=3D"http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udc=
p2p=3D"http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf=3D"http:/=
/schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss=3D"http://sche=
mas.microsoft.com/office/2006/digsig-setup" xmlns:dssi=3D"http://schemas.mi=
crosoft.com/office/2006/digsig" xmlns:mdssi=3D"http://schemas.openxmlformat=
s.org/package/2006/digital-signature" xmlns:mver=3D"http://schemas.openxmlf=
ormats.org/markup-compatibility/2006" xmlns:m=3D"http://schemas.microsoft.c=
om/office/2004/12/omml" xmlns:mrels=3D"http://schemas.openxmlformats.org/pa=
ckage/2006/relationships" xmlns:spwp=3D"http://microsoft.com/sharepoint/web=
partpages" xmlns:ex12t=3D"http://schemas.microsoft.com/exchange/services/20=
06/types" xmlns:ex12m=3D"http://schemas.microsoft.com/exchange/services/200=
6/messages" xmlns:pptsl=3D"http://schemas.microsoft.com/sharepoint/soap/Sli=
deLibrary/" xmlns:spsl=3D"http://microsoft.com/webservices/SharePointPortal=
Server/PublishedLinksService" xmlns:Z=3D"urn:schemas-microsoft-com:" xmlns:=
st=3D"" xmlns=3D"http://www.w3.org/TR/REC-html40">

<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)">
<style>
<!--
 /* Font Definitions */
 @font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p
	{mso-style-priority:99;
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext=3D"edit">
  <o:idmap v:ext=3D"edit" data=3D"1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DWordSection1>

<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'>Thanks Alex.<o:p></o:p></span></p>

<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<div style=3D'border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in =
0in 0in'>

<p class=3DMsoNormal><b><span style=3D'font-size:10.0pt;font-family:"Tahoma=
","sans-serif"'>From:</span></b><span
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Alex Loddenga=
ard
[mailto:alex@cloudera.com] <br>
<b>Sent:</b> Thursday, July 08, 2010 11:39 AM<br>
<b>To:</b> hdfs-user@hadoop.apache.org<br>
<b>Subject:</b> Re: rebalancing replciation help<o:p></o:p></span></p>

</div>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

<p class=3DMsoNormal>Hi Arun,<o:p></o:p></p>

<div>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div>

<p class=3DMsoNormal>Consider setting&nbsp;dfs.balance.bandwidthPerSec to
something as high as&nbsp;20971520 for the balancer and the setrep. &nbsp;Y=
ou
can do this by supplying -D at the command line.<o:p></o:p></p>

</div>

<div>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div>

<p class=3DMsoNormal>Your strategy for getting data onto the 5 nodes is cor=
rect:
balance and setrep. &nbsp;Just understand these things take time.<o:p></o:p=
></p>

</div>

<div>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div>

<p class=3DMsoNormal>Hope this helps.<o:p></o:p></p>

</div>

<div>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

</div>

<div>

<p class=3DMsoNormal style=3D'margin-bottom:12.0pt'>Alex<o:p></o:p></p>

<div>

<p class=3DMsoNormal>On Wed, Jul 7, 2010 at 4:09 PM, Arun Ramakrishnan &lt;=
<a
href=3D"mailto:aramakrishnan@languageweaver.com">aramakrishnan@languageweav=
er.com</a>&gt;
wrote:<o:p></o:p></p>

<div>

<div>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>Hi
guys.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;
I have more than a specific question. I am going to layout the steps I have
taken. Please comment on what I can do better.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;
I was trying to to add 5 nodes to my existing 10 node cluster and also incr=
ease
the replication factor from 2 to 3.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>I
thought I don&#8217;t have to run the balancer cause it would most likely p=
ut the new
replicas into the new nodes.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>There
are about 500k blocks.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>I
wanted to get it all stabilized(replication and balancing) within 24 hours.=
 Its
more than 24 hours now and fsck reports 30% under replication. Is there a w=
ay
to force hdfs to use balance/replicate more aggressively.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>It
would be great if someone explained what/when things happen to blocks in th=
e
context of <o:p></o:p></p>

<p style=3D'margin-left:22.5pt'>1)<span style=3D'font-size:7.0pt'>&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;
</span>Rebalancing<o:p></o:p></p>

<p style=3D'margin-left:22.5pt'>2)<span style=3D'font-size:7.0pt'>&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;
</span>&#8211;setrep<o:p></o:p></p>

<p style=3D'margin-left:22.5pt'>3)<span style=3D'font-size:7.0pt'>&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;
</span>Restarting cluster with a higher/lower replication factor.<o:p></o:p=
></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>A
few questions and a few issues here.<o:p></o:p></p>

<p>1)<span style=3D'font-size:7.0pt'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>=
When
you restart the cluster with a higher than previous replication value. Does=
 it
also apply to existing blocks or only to new blocks being created ? <o:p></=
o:p></p>

<p>2)<span style=3D'font-size:7.0pt'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>=
Does
the balancer take into account under replication of blocks or does it blind=
ly
start moving existing blocks to reach threshold ?<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>A
very specific problem . &nbsp;I am having this strange problem where the
&#8211;setrep hangs on one particular block for hours. Is this because its =
corrupt ?.
But, fsck said its healthy.<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>&nbsp;<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>Thanks
<o:p></o:p></p>

<p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt=
:auto'>Arun<o:p></o:p></p>

</div>

</div>

</div>

<p class=3DMsoNormal><o:p>&nbsp;</o:p></p>

</div>

</div>

</body>

</html>

--_000_C3AD6464AC81DC4AB14FFEA31391866A7932250406AUSP01VMBX08c_--