hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frode Halvorsen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9650) Problem is logging of "Redundant addStoredBlock request received"
Date Fri, 15 Jan 2016 20:39:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102436#comment-15102436

Frode Halvorsen commented on HDFS-9650:

I have an addition;
When I 'cleaned out' the datanode, it started out with 0 blocks, and did of course not do
any harm to the name-node when restarting.
Today I tried to restart it again, after it had recieved a few blocks, and the log on the
namenode started getting those masseages again.
This time, however, the datanode didn't have enough blocks to spam the namenode long enough
to take i down. The namenode just reported those redundant addStoreBlock for the new blocks
on the datanode, and finished up before the failovercontroller shut it down...  
So just now, I have a few datanodes I cannot restart... 

> Problem is logging of "Redundant addStoredBlock request received"
> -----------------------------------------------------------------
>                 Key: HDFS-9650
>                 URL: https://issues.apache.org/jira/browse/HDFS-9650
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Frode Halvorsen
> Description;
> Hadoop 2.7.1. 2 namenodes in HA. 14 Datanodes.
> Enough CPU,disk and RAM.
> Just discovered that some datanodes must have been corrupted somehow.
> When restarting  a 'defect' ( works without failure except when restarting) the active
namenode suddenly is logging a lot of : "Redundant addStoredBlock request received"
> and finally the failover-controller takes the namenode down, fails over to other node.
This node also starts logging the same, and as soon as the fisrt node is bac online, the failover-controller
again kill the active node, and does failover.
> This node now was started after the datanode, and doesn't log "Redundant addStoredBlock
request received" anymore, and restart of the second name-node works fine.
> If I again restarts the datanode- the process repeats itself.
> Problem is logging of "Redundant addStoredBlock request received" and why does it happen
> The failover-controller acts the same way as it did on 2.5/6 when we had a lot of 'block
does not belong to any replica'-messages. Namenode is too busy to respond to heartbeats, and
is taken down...
> To resolv this, I have to take down the datanode, delete all data from it, and start
it up. Then cluster will reproduce the missing blocks, and the failing datanode is working
fine again...

This message was sent by Atlassian JIRA

View raw message