hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2018) Move all journal stream management code into one place
Date Thu, 07 Jul 2011 22:48:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061638#comment-13061638

Todd Lipcon commented on HDFS-2018:

Some initial notes on the patch:

- the idea of the "remote edit log manifest" and the way we do edits transfer is inextricably
linked to the idea of log segments. But, the new JournalManager APIs are based on the idea
that logs are just sequences with no segmenting. I think having both ideas coexist is fairly
confusing and a good opening for bugs -- eg right now, the JournalManagers can return RemoteEditLogs
for any transaction range, but the GetImageServlet still expects files. If edits are to be
decoupled from files, then RemoteEditLogs should probably include a URI which identifies an
edits transfer method. For FileJournalManager, the URI would be http-based and simply point
to the GetImageServlet, but with BK-based logs it would point to the ZK ledger, right?

- in getEditLogManifest, why don't we use the already-existing JournalManagers instead of
recreating them based on storage directories?
- that same function swallows the IOE with no log
- same with selectInputStream -- creating new JournalManager instead of using existing list.
These functions are nearly static except for the "storage" member -- maybe they should just
be static methods to make them easier to test and clarify their tighter scope?

- why did you add the no-arg constructor to EditLogLoader? It's used in a few places, but
it seems those places then call validateEditLog non-statically even though it's a static function.
not sure why.

- some spurious changes elsewhere in FSEditLogLoader (unchecked cast, change to some case
- typo in javadoc: "If the file is stream during the header"
- rather than using "-1" as special value in this code, how about FSConstants.INVALID_TXID
(which has some other negative value so it's more obvious when it gets used)
- FSEditLogValidation now has non-orthogonal information -- the number of valid edits should
be exactly endTxId - startTxId + 1, right? We should remove either endTxId or numTransactions
- perhaps the validate code should also have a check that txids are always increasing by 1?

+    this.editsDirs = new ArrayList<URI>();
+    this.editsDirs.addAll(editsDirs);
why not just {{Lists.newArrayList(editsDirs)}}?

- the startup behavior with this patch changes such that the image is always loaded before
edits are even considered. In the case that the edits are somehow inaccessible or incomplete,
it will still load the image and only then realize that the logs can't be loaded. Can we restore
the previous behavior that a full "plan" is made before proceeding with the heavy lifting
of image loading?

- in {{loadEdits}} -- should log at INFO level each stream that is being loaded. This is important
for ops to know what the NN is up to doing startup.
- can we remove some of the dup code between old-format and new format? eg add a function
which returns Iterable<EditLogInputStream>? the "old" impl would create them based on
inspector.getLatestEditFiles, and the "new" impl would encapsulate the selectInputStream logic.
- it bugs me that "getLatestEditsFiles()" has a side effect. IMO functions called "get*" should
be side effect free.
- in selectInputStream, it's counting both finalized and unfinalized transactions. But at
startup, it should be recovering all of the inprogress logs to finalized logs, right? Given
that, I don't think we need the API {{getNumberOfTransactions}} -- ie we only need the finalized
- style nites: please camel case variable names, eg {{alllogfiles}} and {{logfiles}} should
be {{allLogFiles}}, {{logFiles}}. Also {{inprogress} -> {{inProgress}}.
- in {{getLogFiles}} there's a needless {{continue}} at the end of the loop
- maybe make the comparator used in that function a constant named something like EditLogFile.COMPARE_BY_START_TXID?
- Add javadoc for CorruptionException.
- the API change on the StorageArchiver interface seems less than ideal -- an archiver may
very well want to know the txid range of a log to know what to do with it -- any way we can
preserve this?

- {{static long readAll}} -- unused param {{nextTxId}}, unused {{count}} var. Seems like this
method isn't very useful either.
- this AbortSpec class in the tests needs some doc - finding it hard to understand the test
case. Perhaps a couple log messages interspersed in {{setupEdits}} would also be handy both
for debugging and for doc purposes.
- once I figured out what the test was doing, I liked it, though :) nice tests.
- a brief javadoc for each of the individual test cases would be handy too - eg explaining
what the file layout it's building looks like -- perhaps paste output of "find dfs/name/current
-name edits\*" in the javadoc?
- in {{testLoadingWithGaps}}, why call {{selectInputStream}} twice?
- extract a constant for the {{build/test/data}} dir -- you're using it lots of places both
in TestEditLog and TestFileJournalManager
- in {{testEditLogManifest}} could be more concise to construct {{editUris}}: {{ImmutableList.of(f1.toURI(),
f2.toURI(), f3.toURI())}}.
- still a TODO about log archiving - I didnt look at the mods in this test file yet.
- do the functional tests for log archiving pass?

> Move all journal stream management code into one place
> ------------------------------------------------------
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
> Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager
and the code for input streams is in the inspectors. This change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of transactions instead
of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message