hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them
Date Wed, 18 Jan 2017 22:41:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828903#comment-15828903
] 

Jing Zhao commented on HDFS-4025:
---------------------------------

Thanks for the update, [~hkoneru]. The current patch looks better. Further comments:
# Journal segment transfer timeout should not share the same configuration with image transfer
timeout, since a log segment is usually smaller than the fsimage. Let's create a new configuration
property for it.
# Accordingly we do not need a public method Util#getImageTransferTimeout.
# In Storage.java, the following is a good change. But the code needs a reformat so that code
like "sd.getStorageDirType" does not break into two lines. Besides, I think the patch will
not use DirIterator anymore, so this change can also be done in a separate jira.
{code}
     private boolean shouldReturnNextDir() {
       StorageDirectory sd = getStorageDir(nextIndex);
-      return (dirType == null || sd.getStorageDirType().isOfType(dirType)) &&
-          (includeShared || !sd.isShared());
+      return (dirType == null || (sd.getStorageDirType() != null && sd
+        .getStorageDirType().isOfType(dirType))) && (includeShared || !sd
+        .isShared());
     }
{code}
# No need to define EDITS/EDITS_INPROGRESS etc. again in JNStorage. Actually currently JournalNode
shares the same storage layout as NameNode, and directly uses FileJournalManager which is
defined in the namenode package. So it's ok to use EDITS/EDITS_INPROGRESS defined in NNStorage.
We can do further code cleanup as a follow-on task.
# Similarly please see if we still need JNStorage#getTemporaryEditsFile and JNStorage#getFinalizedEditsFile.
# getNamespaceInfo can be defined in Storage and let NNStorage override it. JNStorage can
directly use the base version.
# Journal#renameTemporarySegments can be renamed to renameTmpSegment since we're renaming
a single segment here. Also no need to call Util#deleteTmpFiles. Just simply call File#delete
and check its result.
# In JournalNodeSyncer, some fields (e.g., journal, jn, jnStorage, conf) can be declared as
final. "NULL_CONTROLLER" can be skipped.
# Maybe we do not need two lists: otherJournalNodeAddrs and journalNodeProxiesList.  We can
create a wrapper class to wrap both InetSocketAddress and QJournalProtocolPB inside. In this
way we only need one list.
# "syncJournalDaemon.setDaemon(true);" is unnecessary since syncJournalDaemon is already a
Daemon.
# getMissingLogList cannot guarantee the returned ArrayList is sorted according to the transaction
id, since the ArrayList is created based on a HashSet. Therefore 1) we cannot guarantee we're
downloading older segments first, 2) the getNextContinuousTxId logic can be wrong.
# The whole "getMissingLogSegments" may need to be redesigned:
#* getMissingLogList can utilize merge-sort like logic to generate the missing list
#* Each time we download a missing segment successfully, we should update lastSyncedTxId accordingly.
#* Once we hit any exception while downloading from the remote JN, we can stop the current
syncing and continue downloading in the next sync session from another JN.
#* Once lastSyncedTxId has reached the last finalized segment, normally the current JN has
caught up. We can reset the lastSyncedTxId back.
# Some further optimization can also be done on getMissingLogList:
#* the remote JN http URLs can be stored in JNSyncer
#* if we know some segments are missing but we did not downloaded in the previous sync, we
can directly download them from a new JN without calling getEditLogManifest RPC.
These can be done separately as follow-on.
# We also need to add a DataTransferThrottler for the downloading to avoid occupying too much
network bandwidth. See TransferFsImage for an example.
# downloadEditLogFromJournalHttpServer can have a shorter name, maybe downloadSegment?
# In downloadEditLogFromJournalHttpServer, no need to call jnStorage.getFiles since we do
not require any special storage dir type here. You can directly check if finalEditsFile exists.
{code}
    File finalEditsFile = jnStorage.getFinalizedEditsFile(log.getStartTxId(),
        log.getEndTxId());

    List<File> finalFiles = jnStorage.getFiles(null, finalEditsFile.getName());
    assert !(finalFiles.isEmpty()) : "No checkpoint targets.";
{code}
# Similarly before calling doGetUrl, no need to generate tmpFiles list. Instead use ImmutableList.of(tmpEditsFile).
We also need to handle Exceptions other than IOException. See Journal#syncLog as an example.
# renameTemporarySegments can be renamed to renameTmpSegment. We can let this method to return
boolean: in this way if the rename fails the tmp file deletion can be done out of the lock.
# For the test we can set a smaller sync interval so that the test can be faster.
# We need to add more tests to cover different scenarios:
#* multiple segments are missing
#* discontinuous segments are missing
#* more than one JNs have missing segments
#* new segment is missing while the syncing is going on


> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, HDFS-4025.002.patch, HDFS-4025.003.patch,
HDFS-4025.004.patch, HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and then comes
back, it will be re-added as a valid part of the quorum on the next log roll. However, it
will not have a complete history of log segments (i.e any individual JN may have gaps in its
transaction history). This mirrors the behavior of the NameNode when there are multiple local
directories specified.
> However, it would be better if a background thread noticed these gaps and "filled them
in" by grabbing the segments from other JournalNodes. This increases the resilience of the
system when JournalNodes get reformatted or otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message