hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4138) BackupNode startup fails due to uninitialized edit log
Date Tue, 06 Nov 2012 08:00:13 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Konstantin Shvachko updated HDFS-4138:
--------------------------------------

    Attachment: hdfs-4138.patch

Kihwal, your analysis of the problem is absolutely correct. There is a race between startCommonServices(),
which initializes metrics, and runCheckpointDaemon(), which initializes EditLog.
I also agree we should be able to move initialization of BackupImage along with its EditLog
out of registerWith() into BN.loadNamesystem(), but this will require some rework of current
code.
The simplest way is to modify the condition in getTransactionsSinceLastLogRoll() as you did
in your patch, only we should avoid adding additional member in FSNamesystem. I did that in
the patch attached.
It becomes a one-line change, only I couldn't help it and removed two redundant fields in
BackupNode, which are not used and anyways replicated in Storage, and also fixed one warning.
I was able to start BN successfully with this patch.
                
> BackupNode startup fails due to uninitialized edit log
> ------------------------------------------------------
>
>                 Key: HDFS-4138
>                 URL: https://issues.apache.org/jira/browse/HDFS-4138
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, name-node
>    Affects Versions: 2.0.3-alpha
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: hdfs-4138.patch, hdfs-4138.patch
>
>
> It was notices by TestBackupNode.testCheckpointNode failure. When a backup node is getting
started, it tries to enter active state and start common services. But when it fails to start
services and exits, which is caught by the exit util.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message