hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them
Date Mon, 29 Aug 2016 21:49:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447153#comment-15447153

Jing Zhao commented on HDFS-4025:

Thanks for working on this, [~hanishakoneru]. Some early comments:
# {{IPCLoggerChannel}} may be too heavy to be used in this use case. We only need to use one
JN RPC proxy and its correponding http address in each sync session. 
# We can put all the sync logic in a separate class
# {{downloadEditLogFromJournalHttpServer}} can also be defined in the new class. Note {{TransferImage}}
is in namenode package.
# We need to wait till we know the result of each sync (or wait till it's timeout) before
we start the next sync. So we can do the sync in a blocking way, and make sure we set the
correct timeout for the rpc/http connection.

> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: QuorumJournalManager (HDFS-3077)
>         Attachments: HDFS-4025.000.patch
> Currently, if a JournalManager crashes and misses some segment of logs, and then comes
back, it will be re-added as a valid part of the quorum on the next log roll. However, it
will not have a complete history of log segments (i.e any individual JN may have gaps in its
transaction history). This mirrors the behavior of the NameNode when there are multiple local
directories specified.
> However, it would be better if a background thread noticed these gaps and "filled them
in" by grabbing the segments from other JournalNodes. This increases the resilience of the
system when JournalNodes get reformatted or otherwise lose their local disk.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message