hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akira Ajisaka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11180) Intermittent deadlock in NameNode when failover happens.
Date Mon, 26 Dec 2016 17:09:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15778674#comment-15778674
] 

Akira Ajisaka commented on HDFS-11180:
--------------------------------------

bq. However, even in branch-2 and trunk, FSImage#getLastAppliedOrWrittenTxId internally calls
FSEditLog#getLastWrittenTxId
After this commit, FSImage#getLastAppliedOrWrittenTxId internally calls FSEditLog#getLastWrittenTxIdWithoutLock.
{code:title=HDFS-11180.04.patch}
@@ -1418,6 +1418,15 @@ public synchronized long getLastAppliedTxId() {
 
   public long getLastAppliedOrWrittenTxId() {
     return Math.max(lastAppliedTxId,
+        editLog != null ? editLog.getLastWrittenTxIdWithoutLock() : 0);
+  }
+
+  /**
+   * This method holds a lock of FSEditLog to get the correct value.
+   * This method must not be used for metrics.
+   */
+  public long getCorrectLastAppliedOrWrittenTxId() {
+    return Math.max(lastAppliedTxId,
         editLog != null ? editLog.getLastWrittenTxId() : 0);
   }
{code}
In addition, I've added a regression test to verify that FSNameSystem metrics don't synchronize
FSEditLog. Please see the test for the detail.

> Intermittent deadlock in NameNode when failover happens.
> --------------------------------------------------------
>
>                 Key: HDFS-11180
>                 URL: https://issues.apache.org/jira/browse/HDFS-11180
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Abhishek Modi
>            Assignee: Akira Ajisaka
>            Priority: Blocker
>              Labels: high-availability
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2, 2.6.6
>
>         Attachments: HDFS-11180-branch-2.01.patch, HDFS-11180-branch-2.6.01.patch, HDFS-11180-branch-2.7.01.patch,
HDFS-11180-branch-2.8.01.patch, HDFS-11180.00.patch, HDFS-11180.01.patch, HDFS-11180.02.patch,
HDFS-11180.03.patch, HDFS-11180.04.patch, jstack.log
>
>
> It is happening due to metrics getting updated at the same time when failover is happening.
Please find attached jstack at that point of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message