hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3452) BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing of lock
Date Tue, 29 May 2012 16:17:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284902#comment-13284902
] 

Hadoop QA commented on HDFS-3452:
---------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530052/HDFS-3452-2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2532//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2532//console

This message is automatically generated.
                
> BKJM:Switch from standby to active fails and NN gets shut down due to delay in clearing
of lock
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3452
>                 URL: https://issues.apache.org/jira/browse/HDFS-3452
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0-alpha
>            Reporter: suja s
>            Assignee: Uma Maheswara Rao G
>            Priority: Blocker
>         Attachments: BK-253-BKJM.patch, HDFS-3452-1.patch, HDFS-3452-2.patch, HDFS-3452.patch,
HDFS-3452.patch
>
>
> Normal switch fails. 
> (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is 5000. By the
time control comes to acquire lock the previous lock is not released which leads to failure
in lock acquisition by NN and NN gets shutdown. Ideally it should have been done)
> =============================================================================
> 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock: Failed to
acquire lock with /ledgers/lock/lock-0000000007, lock-0000000006 already has it
> 2012-05-09 20:15:29,732 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec,
stream=null))
> java.io.IOException: Could not acquire lock
> at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107)
> at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406)
> at org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551)
> at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322)
> at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548)
> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598)
> at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287)
> at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
> at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978)
> at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX
> Scenario:
> Start ZKFCS, NNs
> NN1 is active and NN2 is standby
> Stop NN1. NN2 tries to transition to active and gets shut down

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message