hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Thomas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included
Date Wed, 30 Jul 2014 00:22:38 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

James Thomas updated HDFS-6777:

    Attachment: 6777-design.pdf

> Supporting consistent edit log reads when in-progress edit log segments are included
> ------------------------------------------------------------------------------------
>                 Key: HDFS-6777
>                 URL: https://issues.apache.org/jira/browse/HDFS-6777
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: qjm
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: 6777-design.pdf, HDFS-6777.patch
> For inotify, we want to be able to read transactions from in-progress edit log segments
so we can serve transactions to listeners soon after they are committed.
> Inotify clients ask the active NN for transactions, and the NN opens EditLogInputStreams
that pull transactions from its various edit repositories (local directories, individual JournalNodes,
etc.). The goal is to send back only successfully sync’ed transactions. What constitutes
a successful sync varies among the edit repositories. In the case of the JournalNodes, a successful
sync requires a write to a quorum of the JournalNodes. Properly sync’ed transactions are
always applied to the namesystem, and unsync’ed transactions are never applied. So it is
natural that listeners are only interested in sync’ed transactions.
> In particular, the NN reads from EditLogInputStreams that it obtains from FSEditLog.selectInputStreams
(inProgressOk = true) and sends transactions back to the client. Some infrastructural changes
are required to prevent any unsync’ed transactions from reaching the client, and we tackle
those in this JIRA.
> The current implementation of selectInputStreams (inProgressOk = true, fromTxId = X)
in FSEditLog essentially asks all available edit log repositories for all in-progress or finalized
log segments with transactions after and including X, and combines log segments starting at
the same transaction ID in a RedundantEditLogInputStream. 
> The first issues with this is that the RedundantEditLogInputStream may combine finalized
and in-progress segments starting at the same txid, and may serve unsync’ed transactions
from the in-progress segments. So we simply discard in-progress segments if we have any finalized
segments starting at the same txid -- the modifcation is to JournalSet.chainAndMakeRedundantStreams.
> The second change we need to make is for the case where we only have in-progress segments
starting at a particular txid. In this case, we know we are at the log segment the NN currently
has open, since the various journals ensure that if there are finalized segments available
for a particular starting txid, we are able to see them (e.g. reads from QJM require a quorum
of JournalNodes to respond, and a finalized segment must be present on at least one JN in
the quorum). Our goal in this case is to return only the in-progress segments being written
by the current NN. This happens trivially in the local edits directory case and the NFS shared
edits case, since there is only a single “replica” and if the NN is writing to it, it
is fully up-to-date. So we focus on QJM here. The key is that JournalNodes maintain a lastWriterEpoch,
which increases monotonically as new writers arrive. So we keep track of the lastWriterEpochs
of the segments we receive from the JNs and discard any segments with lastWriterEpochs less
than the maximum one seen. We know that when a new writer contacts the JournalNodes, it finalizes
all sync’ed transactions, so in-progress segments with out-of-date lastWriterEpochs may
contain unsync’ed transactions and any sync’ed transactions they contain will also be
contained in finalized segments on a quorum on JNs. So we can safely discard them. The segments
with the largest lastWriterEpochs may also contain unsync’ed transactions from a previous
writer if the current NN has not yet sync’ed any transactions, but in this case we will
make sure to read only from finalized segments (logic not in this JIRA). These segments may
also contain unsync’ed transactions from the current NN (e.g. which we read in the window
where the NN has written to less than a quorum of JNs but has not yet crashed due to failure
to reach a quorum on write), but we only return to the client transactions with txids less
than or equal to last txid the NN has sync’ed, a value it already stores in a variable (logic
not in this JIRA). The main modification for this change is to QuorumJournalManager.selectInputStreams.

This message was sent by Atlassian JIRA

View raw message