hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2018) Move all journal stream management code into one place
Date Sun, 10 Jul 2011 19:51:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062798#comment-13062798

Ivan Kelly commented on HDFS-2018:

in selectInputStream, it's counting both finalized and unfinalized transactions. But at startup,
it should be recovering all of the inprogress logs to finalized logs, right? Given that, I
don't think we need the API getNumberOfTransactions – ie we only need the finalized one.
We need both, there are two times which you need to count the number of transactions on a
journal, startup and checkpointing. For startup you want to consider inprogress logs. They're
the result of a crash. For checkpointing, they shouldn't be. The primary is still writing
to an inprogress.
With a file based journal, you cannot tell if you are starting up or checkpointing without
some kind of write lease for the journal, which we don't have now (May be a nice thing to
have in future).

the API change on the StorageArchiver interface seems less than ideal – an archiver may
very well want to know the txid range of a log to know what to do with it – any way we can
preserve this?
I've put the txid range back into this API. I haven't used the FoundFSImage and FoundEditLog
interfaces though, as it would create a circular dependency between StorageInspector and StorageArchiver.
Also, FoundEditLog has gone away, so using File and longs makes it more uniform.

the idea of the "remote edit log manifest" and the way we do edits transfer is inextricably
linked to the idea of log segments. But, the new JournalManager APIs are based on the idea
that logs are just sequences with no segmenting. I think having both ideas coexist is fairly
confusing and a good opening for bugs – eg right now, the JournalManagers can return RemoteEditLogs
for any transaction range, but the GetImageServlet still expects files. If edits are to be
decoupled from files, then RemoteEditLogs should probably include a URI which identifies an
edits transfer method. For FileJournalManager, the URI would be http-based and simply point
to the GetImageServlet, but with BK-based logs it would point to the ZK ledger, right?
Further to what I said about URIs last week, I spoke to Jitendra about this transfer before
and he said that the plan was to take this functionality out of band, with rsync or something.
Now that image and logs are decoupled this is possible.

> Move all journal stream management code into one place
> ------------------------------------------------------
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
> Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager
and the code for input streams is in the inspectors. This change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of transactions instead
of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message