hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hadoop hive <hadooph...@gmail.com>
Subject Re: Hadoop doesn't work after restart
Date Wed, 24 Jun 2015 15:33:20 GMT
Try running fsck

On Wed, Jun 24, 2015 at 2:54 PM, Ja Sam <ptrstpppp@gmail.com> wrote:

> I had a running Hadoop cluster (version 2.2.0.2.0.6.0-76 from
> Hortonworks). Yesterday a lot of things happened nad in some point of time
> we decided to one by one reboot all datanodes. Unfortunate the operator did
> monitor the namenode health monitor.
>
> The result of above operation is that all datanodes shows as dead nodes,
> all blocked are lost, ... .
>
> In one datanode which we decided to reboot it once again to see if
> datanode will log anything interesting. The log finished with informations:
>
> INFO  ipc.Server (Server.java:run(861)) - IPC Server Responder: starting
> INFO  ipc.Server (Server.java:run(688)) - IPC Server listener on 8010: starting
>
> and hangs here. In the same time on namnode I can see only two types of
> messages:
>
> INFO  hdfs.StateChange (FSNamesystem.java:completeFile(2805)) - DIR* completeFile: [SOME
PATH] is closed by DFSClient_NONMAPREDUCE_288661168_33
>
> and a lot of:
>
> WARN  blockmanagement.BlockManager (PendingReplicationBlocks.java:pendingReplicationCheck(249))
- PendingReplicationMonitor timed out blk_1074405820_668233
>
> Today we decided to restart name node and all data nodes. After restart
> website: http://[server]:50070/dfshealth.jspanswers VERY slow. I don't
> see any errors in log except 5 like bellow:
>
>  ERROR datanode.DataNode (DataXceiver.java:run(225)) - maelhd21:50010:DataXceiver error
processing WRITE_BLOCK operation  src: /node1:33470 dest: /node3:50010
>
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException:
> Block BP-1037132819-192.168.61.196-1409328081083:blk_1075994366_2257020
> already exists in state FINALIZED and thus cannot be created.
>
> 3 out of 5 nodes shows as lived, but refresh of hadoop status page takes
> more than 10 minutes.
>
> The question of course is: what should I check or do now?
>
>
> p.s. I asked same question on StackOverflow:
> http://stackoverflow.com/questions/31020877/datanodes-are-cannot-connect-to-namenode
>

Mime
View raw message