hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3769) standby namenode become active fails because starting log segment fail on shared storage
Date Fri, 17 Aug 2012 19:37:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437002#comment-13437002
] 

Uma Maheswara Rao G commented on HDFS-3769:
-------------------------------------------

Hi Liaowenrui,

This problem we have fixed in BK side right? is that the same problem you are talking here?
On BKs unavailability, bk recovery will fail on single attempt and lead to close the ledger
with -1 on one corner case. SO, when other node becomes active, that will clean this ledger
as lastConfiem is -1. And leading that entry miss right. If that is the case, we already handled
at BK side.

Please confirm the same. If this issue is same we can close this JIRA here.
                
> standby namenode become active fails because starting log segment fail on shared storage
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-3769
>                 URL: https://issues.apache.org/jira/browse/HDFS-3769
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.0.0-alpha
>         Environment: 3 datanode:158.1.132.18,158.1.132.19,160.161.0.143
> 2 namenode:158.1.131.18,158.1.132.19
> 3 zk:158.1.132.18,158.1.132.19,160.161.0.143
> 3 bookkeeper:158.1.132.18,158.1.132.19,160.161.0.143
> ensemble-size:2,quorum-size:2
>            Reporter: liaowenrui
>            Priority: Critical
>             Fix For: 2.1.0-alpha, 2.0.1-alpha
>
>
> 2012-08-06 15:09:46,264 ERROR org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper:
Node /ledgers/available already exists and this is not a retry
> 2012-08-06 15:09:46,264 INFO org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager:
Successfully created bookie available path : /ledgers/available
> 2012-08-06 15:09:46,273 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
Recovering unfinalized segments in /opt/namenodeHa/hadoop-2.0.1/hadoop-root/dfs/name/current
> 2012-08-06 15:09:46,277 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching
up to latest edits from old active before taking over writer role in edits logs.
> 2012-08-06 15:09:46,363 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing
replication and invalidation queues...
> 2012-08-06 15:09:46,363 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager:
Marking all datandoes as stale
> 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Total number of blocks            = 239
> 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of invalid blocks          = 0
> 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of under-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of  over-replicated blocks = 0
> 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
Number of blocks being written    = 0
> 2012-08-06 15:09:46,383 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will
take over writing edit logs at txnid 2354
> 2012-08-06 15:09:46,471 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting
log segment at 2354
> 2012-08-06 15:09:46,472 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
starting log segment 2354 failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@4eda1515,
stream=null))
> java.io.IOException: We've already seen 2354. A new stream cannot be created with it
>         at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.startLogSegment(BookKeeperJournalManager.java:297)
>         at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:86)
>         at org.apache.hadoop.hdfs.server.namenode.JournalSet$2.apply(JournalSet.java:182)
>         at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:319)
>         at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:179)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:894)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:268)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:618)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1322)
>         at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
>         at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
>         at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1230)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:990)
>         at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
>         at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>                                                                                     
                                 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message