Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of wyssman@gmail.com designates
 209.85.219.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFAoezBVeyzC83JnVXC+eQgtfpommhd2Ov0yf3hkXPtPXYXuXA@mail.gmail.com>
References: 
 <CAFAoezBVeyzC83JnVXC+eQgtfpommhd2Ov0yf3hkXPtPXYXuXA@mail.gmail.com>
Date: Wed, 17 Apr 2013 17:45:26 -0700
Message-ID: 
 <CAFAoezChSZ2yajSOZdBzhO98PTG0heVCR_6sr2pXDZqVzBpqcQ@mail.gmail.com>
Subject: Re: How to fix Under-replication
From: Keith Wyss <wyssman@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=089e0149cd4a24c70004da97eb3a

--089e0149cd4a24c70004da97eb3a
Content-Type: text/plain; charset=ISO-8859-1

In case anyone finds this tucked away on the internet in the future and is
in a situation like us....

We only had 3 racks, and 4 machines on one rack, compared with almost 70
nodes in our virtually provisioned service.

I found that hadoop calculates the maximum number of blocks per rack by

 int maxNodesPerRack =
      (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;

The division is integer division. So with three racks, this amounts to
two blocks per rack. With 2 racks, you can store 3 blocks on each
rack.
We are decommissioning the 4 nodes so that even if we don't have rack
awareness, we'll at least get three copies.

The proper way to fix this is to adjust hadoop's topology script as
you can read about here:
http://wiki.apache.org/hadoop/topology_rack_awareness_scripts

Alternatively, HDFS-385 might be included in your distribution,
allowing you to control the block allocation policy.
https://issues.apache.org/jira/browse/HDFS-385

Cheers,
Keith


On Wed, Apr 17, 2013 at 1:27 PM, Keith Wyss <wyssman@gmail.com> wrote:

> Hello there.
>
> I am operating a cluster that is consistently unable to create three
> replicas for a large percentage of blocks.
>
> I think I have a good idea of why this is the case, but I would like
> suggestions about how to fix it.
>
> First of all, we can begin with the namenode logs.
>
> There are lots of incidences of this statement:
> WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to
> place enough replicas, still in need of 1
>
> The cluster is only just over 50% full and has well over 3 nodes. This and
> the lack of other widespread areas rules out the possibility that there is
> simply not room for the blocks.
>
> This leaves the possibility that the namenode is unable to satisfy the the
> block placement policy. I believe that this is what is happening.
>
> I read in
> http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera
> that if there are more than 2 racks, then a block must be present on at
> least two racks.
>
> This makes sense, but our network situation is a little bizarre. It
> consists of:
> -a small amount of machines that have a dedicated datacenter/rack/host
> configuration
> -- These are spread across a few racks.
> -a large amount of machines that are provisioned using an internal
> hardware as a service provider.
> -- These are listed as one rack.
>
> The details of the rack allocation for the machines that are provisioned
> from the hardware as a service provider are abstracted away and are not
> attainable. The connection to the hardware provisioned as a service has a
> lot of bandwidth, so this is not as crazy as it sounds.
>
> Our problem is that the machines on all the smaller racks have now filled
> up the amount of space partitioned
> by dfs.datanode.du.reserved. This means that all of the blocks since these
> machines ran out of space are lacking one replication.
>
> Is there a way to configure Hadoop to create a third replication anyway
> (aside from changing the hadoop.topology.script implementation)?
>
> What can I do to either confirm or deny my suspicions?
>
> Thanks for your help,
> Keith
>
>
>

--089e0149cd4a24c70004da97eb3a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>In case anyone finds this tucked away on the int=
ernet in the future and is in a situation like us....<br><br></div>We only =
had 3 racks, and 4 machines on one rack, compared with almost 70 nodes in o=
ur virtually provisioned service.<br>
<br></div>I found that hadoop calculates the maximum number of blocks per r=
ack by <br><br><pre class=3D""> <span class=3D"">int</span> maxNodesPerRack=
 <span class=3D"">=3D</span>=20
      <span class=3D"">(</span>totalNumOfReplicas<span class=3D"">-</span><=
span class=3D"">1</span><span class=3D"">)</span><span class=3D"">/</span>c=
lusterMap.<span class=3D"">getNumOfRacks</span><span class=3D"">(</span><sp=
an class=3D"">)</span><span class=3D"">+</span><span class=3D"">2</span><sp=
an class=3D"">;<br>
<br></span></pre><pre class=3D""><span class=3D"">The division is integer d=
ivision. So with three racks, this amounts to two blocks per rack. With 2 r=
acks, you can store 3 blocks on each rack.<br>We are decommissioning the 4 =
nodes so that even if we don&#39;t have rack awareness, we&#39;ll at least =
get three copies.<br>
<br></span></pre><pre class=3D""><span class=3D"">The proper way to fix thi=
s is to adjust hadoop&#39;s topology script as you can read about here: <a =
href=3D"http://wiki.apache.org/hadoop/topology_rack_awareness_scripts">http=
://wiki.apache.org/hadoop/topology_rack_awareness_scripts</a><br>
<br></span></pre><pre class=3D""><span class=3D"">Alternatively, HDFS-385 m=
ight be included in your distribution, allowing you to control the block al=
location policy.<br><a href=3D"https://issues.apache.org/jira/browse/HDFS-3=
85">https://issues.apache.org/jira/browse/HDFS-385</a><br>
<br></span></pre><pre class=3D""><span class=3D"">Cheers,<br>Keith<br></spa=
n></pre><pre class=3D""><span class=3D""><br></span></pre><br></div><div cl=
ass=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Wed, Apr 17, 2013=
 at 1:27 PM, Keith Wyss <span dir=3D"ltr">&lt;<a href=3D"mailto:wyssman@gma=
il.com" target=3D"_blank">wyssman@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><d=
iv>Hello there.<br><br></div>I am operating a cluster that is consistently =
unable to create three replicas for a large percentage of blocks.<br>
<br></div><div>I think I have a good idea of why this is the case, but I wo=
uld like suggestions about how to fix it.<br>
</div><div><br></div>First of all, we can begin with the namenode logs.<br>=
<br></div>There are lots of incidences of this statement:<br>WARN org.apach=
e.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replic=
as, still in need of 1<br>

<br></div>The cluster is only just over 50% full and has well over 3 nodes.=
 This and the lack of other widespread areas rules out the possibility that=
 there is simply not room for the blocks.<br><br></div>This leaves the poss=
ibility that the namenode is unable to satisfy the the block placement poli=
cy. I believe that this is what is happening.<br>

<br></div><div>I read in <a href=3D"http://www.slideshare.net/cloudera/hado=
op-troubleshooting-101-kate-ting-cloudera" target=3D"_blank">http://www.sli=
deshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera</a><br>
</div><div>that if there are more than 2 racks, then a block must be presen=
t on at least two racks.<br>
<br></div><div>This makes sense, but our network situation is a little biza=
rre. It consists of:<br></div><div>-a small amount of machines that have a =
dedicated datacenter/rack/host configuration<br></div><div>-- These are spr=
ead across a few racks.<br>

</div><div>-a large amount of machines that are provisioned using an intern=
al hardware as a service provider.<br></div><div>-- These are listed as one=
 rack.<br><br></div><div>The details of the rack allocation for the machine=
s that are provisioned from the hardware as a service provider are abstract=
ed away and are not attainable. The connection to the hardware provisioned =
as a service has a lot of bandwidth, so this is not as crazy as it sounds.<=
br>

<br></div><div>Our problem is that the machines on all the smaller racks ha=
ve now filled up the amount of space partitioned <br></div><div>by dfs.data=
node.du.reserved. This means that all of the blocks since these machines ra=
n out of space are lacking one replication.<br>

<br></div><div>Is there a way to configure Hadoop to create a third replica=
tion anyway (aside from changing the hadoop.topology.script implementation)=
?<br><br></div><div>What can I do to either confirm or deny my suspicions? =
<br>

<br></div><div>Thanks for your help,<br></div><div>Keith<br></div><div><br>=
<br></div></div>
</blockquote></div><br></div>

--089e0149cd4a24c70004da97eb3a--