hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Mankude (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2753) Standby namenode stuck in safemode during a failover
Date Thu, 05 Jan 2012 21:56:39 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Mankude updated HDFS-2753:
-------------------------------

    Summary: Standby namenode stuck in safemode during a failover  (was: Standby namenode
stuck in safenode during a failover)
    
> Standby namenode stuck in safemode during a failover
> ----------------------------------------------------
>
>                 Key: HDFS-2753
>                 URL: https://issues.apache.org/jira/browse/HDFS-2753
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>
> teragen is run to generate write traffic. A manual failover is initiated by killing namenode
process. namenode which was killed is then restarted and it comes up as a standby. However,
the standby never exits out of the safemode. Looking at the standby logs, it looks like the
standby namenode gets addStoredBlock immediately after data node registration which results
in numBlocks getting incremented. There is an optimization in processReport() which causes
the follow on complete block report to be ignored when NN is in safemode. So, NN never exits
out of the safemode.
> 2012-01-05 18:57:46,030 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(573))
- BLOCK* NameSystem.registerDatanode: node registration from 98.137.233.235:50010 storage
DS-526656430-98.137.233.235-50010-1325723536492
> 2012-01-05 18:57:46,033 INFO  net.NetworkTopology (NetworkTopology.java:add(344)) - Adding
a new node: /default-rack/98.137.233.235:50010
> 2012-01-05 18:57:46,033 INFO  namenode.FSNamesystem (FSNamesystem.java:checkMode(3411))
- DID NOT call initialize at 1
> 2012-01-05 18:57:46,034 INFO  hdfs.StateChange (DatanodeManager.java:registerDatanode(573))
- BLOCK* NameSystem.registerDatanode: node registration from 98.137.233.237:50010 storage
DS-1961520590-98.137.233.237-50010-1325725253057
> 2012-01-05 18:57:46,034 INFO  net.NetworkTopology (NetworkTopology.java:add(344)) - Adding
a new node: /default-rack/98.137.233.237:50010
> 2012-01-05 18:57:46,042 INFO  namenode.FSNamesystem (FSNamesystem.java:checkMode(3411))
- DID NOT call initialize at 1
> 2012-01-05 18:57:46,045 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1775))
- BLOCK* addStoredBlock: blockMap updated: 98.137.233.235:50010 is added to blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]}
size 0
> 2012-01-05 18:57:46,046 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1775))
- BLOCK* addStoredBlock: blockMap updated: 98.137.233.235:50010 is added to blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]}
size 0
> 2012-01-05 18:57:46,046 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1775))
- BLOCK* addStoredBlock: blockMap updated: 98.137.233.237:50010 is added to blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED], ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]}
size 0
> 2012-01-05 18:57:46,046 INFO  hdfs.StateChange (BlockManager.java:addStoredBlock(1775))
- BLOCK* addStoredBlock: blockMap updated: 98.137.233.237:50010 is added to blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION,
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED], ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]}
size 0
> 2012-01-05 18:57:46,049 INFO  hdfs.StateChange (BlockManager.java:processReport(1365))
- BLOCK* processReport: discarded non-initial block report from 98.137.233.235:50010 because
namenode still in startup phase
> 2012-01-05 18:57:46,049 INFO  hdfs.StateChange (BlockManager.java:processReport(1365))
- BLOCK* processReport: discarded non-initial block report from 98.137.233.237:50010 because
namenode still in startup phase
> 2012-01-05 18:58:05,167 INFO  namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894))
- Required GS=1175, Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:05,168 INFO  namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894))
- Required GS=1175, Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:06,634 INFO  namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894))
- Required GS=1176, Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:06,636 INFO  namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894))
- Required GS=1176, Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:08,097 INFO  namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894))
- Required GS=1177, Queuing blockReceivedAndDeleted message
> 2012-01-05 18:58:08,097 INFO  namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894))
- Required GS=1177, Queuing blockReceivedAndDeleted message

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message