hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them
Date Thu, 12 Jan 2017 01:15:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819779#comment-15819779
] 

Jing Zhao commented on HDFS-4025:
---------------------------------

Thanks for updating the patch, [~hkoneru]. Some further comments:
# We do not need to move getAddressesList() to DatanodeUtil.
# getAddressList() and getOtherJournalNodeAddrs can be combined into one util method: getLoggerAddresses(URI
uri, Set<InetSocketAddress> toExclude).
# Need to clean the uused imports and unused variables in JournalNodeSyncer.java
# sync_journals_timeout should not be retrieved from a newly created configuration in a static
code block. It should be initialized based on the configuration passed to JournalNodeSyncer
constructor.
# We need to make sure syncJournalDaemon is always running while the JN is alive. So syncJournals
should be in a try-catch block which catches Throwables. Please see BlockManager.RedundancyMonitor#run
as an example.
# Need to stop syncers when stopping JN.
# The temp log segment files should be always be downloaded into the current directory. Thus
downloadEditLogFromJournalHttpServer can be further simplified.
# The current code may hit a race during the rolling-upgrade rollback. If the rollback happens,
some log segments may be deleted while a syncer may download them from a remote JN which gets
delayed in the rollback. Thus renaming temp journal files needs to be protected by Journal's
monitor and we need to make sure its end index is smaller than the current committedTxnId.
# We can consider adding a configuration flag to turn off this feature.
# We do not need to get the local local log manifest for each syncing. The local log segment
manifest can be reused.

> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, HDFS-4025.002.patch, HDFS-4025.003.patch,
HDFS-4025.004.patch, HDFS-4025.005.patch, HDFS-4025.006.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and then comes
back, it will be re-added as a valid part of the quorum on the next log roll. However, it
will not have a complete history of log segments (i.e any individual JN may have gaps in its
transaction history). This mirrors the behavior of the NameNode when there are multiple local
directories specified.
> However, it would be better if a background thread noticed these gaps and "filled them
in" by grabbing the segments from other JournalNodes. This increases the resilience of the
system when JournalNodes get reformatted or otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message