Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of hadoophive@gmail.com designates
 209.85.213.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAKvih5urKMhx1=B_nKZbFDTObbdJA-R2sMCBZoSCQnLu7oGbPg@mail.gmail.com>
References: 
 <CAKvih5urKMhx1=B_nKZbFDTObbdJA-R2sMCBZoSCQnLu7oGbPg@mail.gmail.com>
Date: Wed, 24 Jun 2015 21:03:20 +0530
Message-ID: 
 <CADtHtMwM+ZYNEUDQGERv4b0+F3GR9S9AW=16b0cy7vBuqj5QVA@mail.gmail.com>
Subject: Re: Hadoop doesn't work after restart
From: hadoop hive <hadoophive@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=bcaec5196a2d0ca5ed0519453a4f

--bcaec5196a2d0ca5ed0519453a4f
Content-Type: text/plain; charset=UTF-8

Try running fsck

On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <ptrstpppp@gmail.com> wrote:

> I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from
> Hortonworks). Yesterday a lot of things happened nad in some point of time
> we decided to one by one reboot all datanodes. Unfortunate the operator did
> monitor the namenode health monitor.
>
> The result of above operation is that all datanodes shows as dead nodes,
> all blocked are lost, ... .
>
> In one datanode which we decided to reboot it once again to see if
> datanode will log anything interesting. The log finished with informations:
>
> INFO  ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
> INFO  ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
>
> and hangs here. In the same time on namnode I can see only two types of
> messages:
>
> INFO  hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR* completeFile: [SOME PATH] is closed by DFSClient_NONMAPREDUCE_288661168_33
>
> and a lot of:
>
> WARN  blockmanagement.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249)) - PendingReplicationMonitor timed out blk_1074405820_668233
>
> Today we decided to restart name node and all data nodes. After restart
> website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't
> see any errors in log except 5 like bellow:
>
>  ERROR datanode.DataNode (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error processing WRITE_BLOCK operation  src: /node1:33470 dest: /node3:50010
>
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020
> already exists in state FINALIZED and thus cannot be created.
>
> 3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
> more than 10 minutes.
>
> The question of course is: what should I check or do now?
>
>
> p.s. I asked same question on StackOverflow:
> http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
>

--bcaec5196a2d0ca5ed0519453a4f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Try running fsck</div><div class=3D"gmail_extra"><br><div =
class=3D"gmail_quote">On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <span dir=3D"=
ltr">&lt;<a href=3D"mailto:ptrstpppp@gmail.com" target=3D"_blank">ptrstpppp=
@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:15=
px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-ser=
if;line-height:19.5px">I had a running Hadoop cluster (version 2.2.0.2.0.6.=
0-76 from Hortonworks). Yesterday a lot of things happened nad in some poin=
t of time we decided to one by one reboot all datanodes. Unfortunate the op=
erator did monitor the namenode health monitor.</p><p style=3D"margin:0px 0=
px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:&#39;He=
lvetica Neue&#39;,Helvetica,Arial,sans-serif;line-height:19.5px">The result=
 of above operation is that all datanodes shows as dead nodes, all blocked =
are lost, ... .</p><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;fo=
nt-size:15px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Aria=
l,sans-serif;line-height:19.5px">In one datanode which we decided to reboot=
 it once again to see if datanode will log anything interesting. The log fi=
nished with informations:</p><pre style=3D"margin-top:0px;padding:5px;borde=
r:0px;font-size:13px;overflow:auto;width:auto;max-height:600px;font-family:=
Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono&#39;,&#=
39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courier Ne=
w&#39;,monospace,sans-serif;word-wrap:normal;background-color:rgb(238,238,2=
38)"><code style=3D"margin:0px;padding:0px;border:0px;font-family:Consolas,=
Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono&#39;,&#39;DejaVu=
 Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courier New&#39;,mo=
nospace,sans-serif;white-space:inherit">INFO  ipc.Server (Server.java:run(8=
61)) - IPC Server Responder: starting
INFO  ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: star=
ting
</code></pre><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-siz=
e:15px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans=
-serif;line-height:19.5px">and hangs here. In the same time on namnode I ca=
n see only two types of messages:</p><pre style=3D"margin-top:0px;padding:5=
px;border:0px;font-size:13px;overflow:auto;width:auto;max-height:600px;font=
-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono=
&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Co=
urier New&#39;,monospace,sans-serif;word-wrap:normal;background-color:rgb(2=
38,238,238)"><code style=3D"margin:0px;padding:0px;border:0px;font-family:C=
onsolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono&#39;,=
9;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courier New=
&#39;,monospace,sans-serif;white-space:inherit">INFO  hdfs.StateChange (FSN=
amesystem.java:completeFile(2805)) - DIR* completeFile: [SOME PATH] is clos=
ed by DFSClient_NONMAPREDUCE_288661168_33
</code></pre><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-siz=
e:15px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans=
-serif;line-height:19.5px">and a lot of:</p><pre style=3D"margin-top:0px;pa=
dding:5px;border:0px;font-size:13px;overflow:auto;width:auto;max-height:600=
px;font-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberati=
on Mono&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,=
&#39;Courier New&#39;,monospace,sans-serif;word-wrap:normal;background-colo=
r:rgb(238,238,238)"><code style=3D"margin:0px;padding:0px;border:0px;font-f=
amily:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono&#=
39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Cour=
ier New&#39;,monospace,sans-serif;white-space:inherit">WARN  blockmanagemen=
t.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249))=
 - PendingReplicationMonitor timed out blk_1074405820_668233
</code></pre><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-siz=
e:15px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans=
-serif;line-height:19.5px">Today we decided to restart name node and all da=
ta nodes. After restart website:=C2=A0<a rel=3D"nofollow" style=3D"margin:0=
px;padding:0px;border:0px;text-decoration:none;color:rgb(12,101,165)">http:=
//[server]:50070/dfshealth.jsp</a>answers VERY slow. I don&#39;t see any er=
rors in log except 5 like bellow:<br></p><pre style=3D"margin-top:0px;paddi=
ng:5px;border:0px;font-size:13px;overflow:auto;width:auto;max-height:600px;=
font-family:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation =
Mono&#39;,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,=
9;Courier New&#39;,monospace,sans-serif;word-wrap:normal;background-color:r=
gb(238,238,238)"><code style=3D"margin:0px;padding:0px;border:0px;font-fami=
ly:Consolas,Menlo,Monaco,&#39;Lucida Console&#39;,&#39;Liberation Mono&#39;=
,&#39;DejaVu Sans Mono&#39;,&#39;Bitstream Vera Sans Mono&#39;,&#39;Courier=
 New&#39;,monospace,sans-serif;white-space:inherit"> ERROR datanode.DataNod=
e (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error processing=
 WRITE_BLOCK operation  src: /node1:33470 dest: /node3:50010
</code></pre><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-siz=
e:15px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans=
-serif;line-height:19.5px">org.apache.hadoop.hdfs.server.datanode.ReplicaAl=
readyExistsException: Block BP-1037132819-192.168.61.196-1409328081083:blk_=
1075994366_2257020 already exists in state FINALIZED and thus cannot be cre=
ated.</p><p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:15=
px;clear:both;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-ser=
if;line-height:19.5px">3 out of 5 nodes shows as lived, but refresh of hado=
op status page takes more than 10 minutes.=C2=A0</p><p style=3D"margin:0px =
0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:&#39;H=
elvetica Neue&#39;,Helvetica,Arial,sans-serif;line-height:19.5px">The quest=
ion of course is: what should I check or do now?</p><p style=3D"margin:0px =
0px 1em;padding:0px;border:0px;font-size:15px;clear:both;font-family:&#39;H=
elvetica Neue&#39;,Helvetica,Arial,sans-serif;line-height:19.5px"><br></p><=
p style=3D"margin:0px 0px 1em;padding:0px;border:0px;font-size:15px;clear:b=
oth;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-serif;line-he=
ight:19.5px">p.s. I asked same question on StackOverflow: <a href=3D"http:/=
/stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namen=
ode" target=3D"_blank">http://stackoverflow.com/questions/31020877/datanode=
s-are-cannot-connect-to-namenode</a></p></div>
</blockquote></div><br></div>

--bcaec5196a2d0ca5ed0519453a4f--