hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them
Date Thu, 02 Feb 2017 22:37:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850633#comment-15850633
] 

Jing Zhao commented on HDFS-4025:
---------------------------------

The failed unit test should be unrelated and has been reported in HDFS-10644.

In the meanwhile, the current patch may still hit an issue while HA upgrade is going on. If
the segment downloading is happening while the admin tries to rollback, the deletion of the
{{current}} directory may fail on Windows. As a fix we can disable the sync while there is
{{prev}} directory on JN (which means the upgrade is still going on). Or we can download the
segment first into another directory. 

Currently I'm thinking maybe we can disable this feature in the configuration by default,
then use separate jiras to track remaining issues. This also allows us to do more testing.
Thoughts?

> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, HDFS-4025.002.patch, HDFS-4025.003.patch,
HDFS-4025.004.patch, HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, HDFS-4025.008.patch,
HDFS-4025.009.patch, HDFS-4025.010.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and then comes
back, it will be re-added as a valid part of the quorum on the next log roll. However, it
will not have a complete history of log segments (i.e any individual JN may have gaps in its
transaction history). This mirrors the behavior of the NameNode when there are multiple local
directories specified.
> However, it would be better if a background thread noticed these gaps and "filled them
in" by grabbing the segments from other JournalNodes. This increases the resilience of the
system when JournalNodes get reformatted or otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message