Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of jwfbean@cloudera.com designates
 74.125.82.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BANLkTikHJoHoyr1Xfkt+3uDb4iTmdSO8kw@mail.gmail.com>
References: <BANLkTikHJoHoyr1Xfkt+3uDb4iTmdSO8kw@mail.gmail.com>
Date: Wed, 11 May 2011 14:02:36 -0700
Message-ID: <BANLkTi=BWcKY8G97Tsw5zK7TYbrM6JNDmg@mail.gmail.com>
Subject: Re: Any fix for this?
From: Jeff Bean <jwfbean@cloudera.com>
To: hdfs-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cdffe4464aa9904a306642b

--000e0cdffe4464aa9904a306642b
Content-Type: text/plain; charset=ISO-8859-1

If I understand correctly, datanode reports its blocks based on the contents
of dfs.data.dir.

When you cloned the data node, you cloned all of its blocks as well.

When you add a "fresh" datanode to the cluster, you add one that has an
empty dfs.data.dir.

Try clearing out dfs.data.dir before adding the new node.

Jeff


On Wed, May 11, 2011 at 1:59 PM, Steve Cohen <mail4steve@gmail.com> wrote:

> Hello,
>
> We are running an hdfs cluster and we decided we wanted to add a new
> datanode. Since we are using a virtual machine, we just cloned an existing
> datanode. We added it to the slaves list and started up the cluster. We
> started getting log messages like this in the namenode log:
>
> 2011-05-11 15:59:44,148 ERROR hdfs.StateChange - BLOCK*
> NameSystem.getDatanode: Data node 10.104.211.58:50010 is attempting to
> report storage ID DS-1360904153-10.104.211.57-50010-1293288346692. Node
> 10.104.211.57:50010 is expected to serve this storage.
> 2011-05-11 15:59:46,975 ERROR hdfs.StateChange - BLOCK*
> NameSystem.getDatanode: Data node 10.104.211.57:50010 is attempting to
> report storage ID DS-1360904153-10.104.211.57-50010-1293288346692. Node
> 10.104.211.58:50010 is expected to serve this storage.
>
> I understand that this is because the datanodes have the exact same
> information so the first data node that connects has precedence.
>
> Is it possible to just wipe one of the datanodes so it is blank or do we
> have to format the entire hdfs filesystem from the namenode to add the new
> datanode.
>
> Thanks,
> Steve Cohen
>

--000e0cdffe4464aa9904a306642b
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

If I understand correctly, datanode reports its blocks based on the content=
s of dfs.data.dir.<br><br>When you cloned the data node, you cloned all of =
its blocks as well.<br><br>When you add a &quot;fresh&quot; datanode to the=
 cluster, you add one that has an empty dfs.data.dir.<br>
<br>Try clearing out dfs.data.dir before adding the new node.<br><br>Jeff<b=
r><br><br><div class=3D"gmail_quote">On Wed, May 11, 2011 at 1:59 PM, Steve=
 Cohen <span dir=3D"ltr">&lt;<a href=3D"mailto:mail4steve@gmail.com">mail4s=
teve@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hello,<br><br>We =
are running an hdfs cluster and we decided we wanted to add a new datanode.=
 Since we are using a virtual machine, we just cloned an existing datanode.=
 We added it to the slaves list and started up the cluster. We started gett=
ing log messages like this in the namenode log:<br>

<br>2011-05-11 15:59:44,148 ERROR hdfs.StateChange - BLOCK* NameSystem.getD=
atanode: Data node <a href=3D"http://10.104.211.58:50010" target=3D"_blank"=
>10.104.211.58:50010</a> is attempting to report storage ID DS-1360904153-1=
0.104.211.57-50010-1293288346692. Node <a href=3D"http://10.104.211.57:5001=
0" target=3D"_blank">10.104.211.57:50010</a> is expected to serve this stor=
age.<br>

2011-05-11 15:59:46,975 ERROR hdfs.StateChange - BLOCK* NameSystem.getDatan=
ode: Data node <a href=3D"http://10.104.211.57:50010" target=3D"_blank">10.=
104.211.57:50010</a> is attempting to report storage ID DS-1360904153-10.10=
4.211.57-50010-1293288346692. Node <a href=3D"http://10.104.211.58:50010" t=
arget=3D"_blank">10.104.211.58:50010</a> is expected to serve this storage.=
<br>

<br>I understand that this is because the datanodes have the exact same inf=
ormation so the first data node that connects has precedence.<br><br>Is it =
possible to just wipe one of the datanodes so it is blank or do we have to =
format the entire hdfs filesystem from the namenode to add the new datanode=
.<br>

<br>Thanks,<br><font color=3D"#888888">Steve Cohen<br>
</font></blockquote></div><br>

--000e0cdffe4464aa9904a306642b--