hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place
Date Fri, 12 Aug 2011 23:55:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084480#comment-13084480
] 

Todd Lipcon commented on HDFS-2018:
-----------------------------------

{quote}
The patch adds EditLogReference as a new concept which actually complicates the API. Because
now, in FSEditLog and FSImage, where JournalManager is being used, the code needs to use the
new interface.
Ivan's patch uses EditLogInputStream which has been in use already.
{quote}

The issue with using EditLogInputStream is that this interface represents an _opened_ stream.
Imagine if {{listFiles()}} on a directory returned a {{List<InputStream>}} -- that would
be obviously a bad API. So I feel the same way about {{selectInputStreams}} -- you should
be able to enumerate which streams you plan to open before going and opening them.

Doing this also allows {{getRemoteEditLogs}} to share code with the startup -- it's obvious
that {{getRemoteEditLogs}} needs to list streams without actually opening them. So might as
well reuse that same interface - the JournalManager just needs to expose its list of available
log segments, and FSEditLog is responsible for using that either for loading or for edits
transfer.

{quote}
EditLogReference actually leaks the idea of segments in different parts of the code, and with
another name.
{quote}
I'm happy to rename this to {{TransactionRange}} or something instead. Regardless of the backend
storage, we need to think about each journal manager providing a set of ranges of transaction
IDs. Then, the algorithm for startup is to simply take the various ranges and merge them to
make a complete range.

I found the code to implement this "merge distinct sets of ranges to cover the target range"
algorithm a lot easier to understand when it's dealing with the full List<EditLogReference>,
rather than having to keeping going back to each JournalManager in turn as it builds up a
list.

{quote}
The recoverUnclosedLogs should not be implemented by a journal. A simpler JournalManager interface
is better, where it just knows what is in progress and what is finalized.
{quote}

The issue is that we sometimes need to read from an in-progress log without finalizing it:
for example in the standby master in an HA setup. If the {{getInputStream}} API finalizes
the log, as you have it, then this becomes impossible, since we'll end up finalizing a log
under the active writer. Like you said earlier, there are distinct read operations and write
operations -- {{getInputStream}} is a read operation. {{recoverLogs}} is a write operation.
Therefore only the active master should call {{recoverLogs}}, and a standby using {{getInputStream}}
should not have any mutative effect on the underlying store.

{quote}
The JournalManager must know whether it is a writer or not to verify whether it can create
OutputStreams or not. Since we already have this information, we can use it to inform our
decision whether to finalized recovered in progress logs or not
{quote}
I'm not sure why the JournalManager needs to have this information. FSEditLog is the one responsible
for coordinating the lifecycle (eg opening logs for writing or reading). So, FSEditLog has
the state, and can use it in one single place to determine whether to ask the JournalManagers
to recover unclosed logs.

Similarly, FSEditLog can use this API whenever it detects that a JournalManager which previously
had failed has "come back" -- that's a good time to finalize whatever logs might have been
truncated by the previous failure, in the future.

{quote}
The new approach also retains getInProgressInputStream() which is only used in a special case
in BackupNode. We shouldn't have methods in the JournalStream API only for corner cases like
this. My patch gets rid of this. It should be possible to do the same with the other approach
{quote}
Not sure it's really a "special case" in the BackupNode - it's the only way the BN can get
back in sync with the primary. We will also use it in the StandbyNode, I imagine - in both
cases we need to be able to read from some logs that may be in the process of being written,
in order to stay in sync.

{quote}
getNumberOfTransactions could be replaced with a getTransactionRanges call. The editlog could
then select the list of transaction ranges it wants to load. It then calls getInputStream()
on the JournalManagers to get a stream starting with the firstTxId of the range. I had considered
an approach vaguely similar before, but didn't implement it.
{quote}
Yes, this is essentially what {{getEditLogs}} is and what {{selectLogsToRead}} is doing. I'm
happy to rename {{EditLogReference}} to {{TransactionRange}} and {{getEditLogs}} to {{getReadableTransactionRanges}}.

{quote}
Removing the openForEdit() call makes it equivalent to RemoteEditLog, so RemoteEditLog should
be removed
{quote}
RemoteEditLog is a "wire type" for transfer to the 2NN and other checkpointers. Keeping the
protocol types distinct from the internal types is a best practice people have been encouraging
recently.

> 1073: Move all journal stream management code into one place
> ------------------------------------------------------------
>
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 0.23.0
>
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
hdfs-2018-otherapi.txt, hdfs-2018.txt
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager
and the code for input streams is in the inspectors. This change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of transactions instead
of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message