hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Xiao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4154) BKJM: Two namenodes usng bkjm can race to create the version znode
Date Wed, 22 May 2013 09:32:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663962#comment-13663962
] 

Han Xiao commented on HDFS-4154:
--------------------------------

Hi, Uma
There has been a testcase for the concurrentFormat which is TestBookKeeperJournalManager.testConcurrentFormat.
In it's expected behavior, which catch the IOException as GoodExcepiton:
            } catch (IOException ioe) {
              LOG.info("Exception formatting ", ioe);
              return ThreadStatus.GOODEXCEPTION;
            }
However, in the starting of NN, the IOException will result in the faild starting of nn. So
the IOException should not be a GoodException.
I revised the testcase to consider IOException as also Bad, however, then after patch applied
testcase can't be passed(Aslo, before the patch it will be failed). I find the problem comes
from 
      // delete old info
      if (zkc.exists(basePath, false) != null) {
        if (zkc.exists(ledgerPath, false) != null) {
          for (EditLogLedgerMetadata l : getLedgerList(true)) {
            try {
              bkc.deleteLedger(l.getLedgerId());
            } catch (BKException.BKNoSuchLedgerExistsException bke) {
              LOG.warn("Ledger " + l.getLedgerId() + " does not exist;"
                       + " Cannot delete.");
            }
          }
        }
        ZKUtil.deleteRecursive(zkc, basePath);
      }
Both bkc and zkutil may throw Exception in concurrent condition. Revision of resolving conflict
is not suitalbe for them and also ugly. 
Therefore, i want tothe confl use a zk-lock to resolving it throughly. What do you think?
                
> BKJM: Two namenodes usng bkjm can race to create the version znode
> ------------------------------------------------------------------
>
>                 Key: HDFS-4154
>                 URL: https://issues.apache.org/jira/browse/HDFS-4154
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: 3.0.0, 2.0.3-alpha
>            Reporter: Ivan Kelly
>            Assignee: Han Xiao
>         Attachments: HDFS-4154.patch
>
>
> nd one will get the following error.
> 2012-11-06 10:04:00,200 INFO hidden.bkjournal.org.apache.zookeeper.ClientCnxn: Session
establishment complete on server 109-231-69-172.flexiscale.com/109.231.69.172:2181, sessionid
= 0x13ad528fcfe0005, negotiated timeout = 4000
> 2012-11-06 10:04:00,710 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception
in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, bookkeeper://109.231.69.172:2181;109.231.69.173:2181;109.231.69.174:2181/hdfsjournal
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1251)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:206)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:657)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:590)
>         at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:544)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:423)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:385)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:401)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:435)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:611)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:592)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1201)
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1249)
>         ... 14 more
> Caused by: java.io.IOException: Error initializing zk
>         at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.<init>(BookKeeperJournalManager.java:233)
>         ... 19 more
> Caused by: hidden.bkjournal.org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /hdfsjournal/version
>         at hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at hidden.bkjournal.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778)
>         at org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.<init>(BookKeeperJournalManager.java:222)
>         ... 19 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message