hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them
Date Wed, 01 Feb 2017 22:39:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849052#comment-15849052

Jing Zhao commented on HDFS-4025:

Thanks for the updating the patch, [~hanishakoneru]. The latest patch looks pretty good to
me. Some minor comments:
# In hdfs-default.xml, "i" --> "if"
+  <name>dfs.journalnode.enable.sync</name>
+  <value>true</value>
+  <description>
+    If true, the journal nodes wil sync with each other. The journal nodes
+    will periodically gossip with other journal nodes to compare edit log
+    manifests and i they detect any missing log segment, they will download
+    it from the other journal nodes.
+  </description>
# In JournalNodeSyncer.java, the following code will generate an {{UnsupportedOperationException}}
since thisJournalEditLogs is an immutable list. In fact this add op can be skipped.
          if (success) {
# Maybe "Transferring" can be changed to "Downloading"?
LOG.info("Transferring Missing Edit Log from " + url + " to " + jnStorage
# {{finalEditsFile}} should be {{tmpEditsFile}}.
    LOG.info("Downloaded file " + tmpEditsFile.getName() + " size " +
        finalEditsFile.length() + " bytes.");
# In {{TestJournalNodeSync}}, {{jid}} can be declared as final, and {{editLogExists}} can
be private.
# For {{deleteEditLog}},  we can either change the while loop to an if, or refresh logFile
instance within the while loop.
+   while (logFile.isInProgress()) {
+      dfsCluster.getNameNode(0).getRpcServer().rollEditLog();
# The following code can be simplified as "Assert.assertTrue("Couldn't delete edit log file",
+    if (!deleteFile.delete()) {
+      assert false: "Couldn't delete edit log file";
+      return null;
+    }
# In {{generateEditLog}}, let's also check the result of {{doAndEdit}}. I.e., we do "Assert.assertTrue(doAnEdit());"

> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, HDFS-4025.002.patch, HDFS-4025.003.patch,
HDFS-4025.004.patch, HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, HDFS-4025.008.patch,
> Currently, if a JournalManager crashes and misses some segment of logs, and then comes
back, it will be re-added as a valid part of the quorum on the next log roll. However, it
will not have a complete history of log segments (i.e any individual JN may have gaps in its
transaction history). This mirrors the behavior of the NameNode when there are multiple local
directories specified.
> However, it would be better if a background thread noticed these gaps and "filled them
in" by grabbing the segments from other JournalNodes. This increases the resilience of the
system when JournalNodes get reformatted or otherwise lose their local disk.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message