hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place
Date Wed, 03 Aug 2011 06:29:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078603#comment-13078603

Todd Lipcon commented on HDFS-2018:

I made it through reviewing part of the most recent patch. Here are some review comments (though
I still have more to look at):

*Specific issues:*
- needToSave() method has been reverted back to one with a TODO that always returns false
- new getEditLogManifest implementation:
-- why re-create FileJournalManagers for each directory instead of looping over the existing
journals array?
-- in testEditLogManifest, shouldn't you also test requesting a manifest mid-log-segment,
since the code seems to try to deal with this?
-- the conditions for determining if a log is "better" than the previous one don't seem quite
right here:
          } else if (candidate.getStartTxId() < bestlog.getStartTxId()) {
            bestlog = candidate;
... because any transactions that come before {{fromTxId}} are dead weight which will have
to be skipped over.
-- please add some comments to these if clauses explaining the rationale - it's slightly tricky
-- the catch clause in this function seems to have the wrong error message, since there shouldn't
be any counting going on. It also should include the exception as the second argument to the
log message, so a trace is logged
-- seems strange that getLogFiles() throws IOE if the parameter is in the middle of segment,
when the rest of the code now appears to try to deal with this case. Which is it?
- getRemoteEditLog needs JavaDoc
- FSEditLog.initJournals is now only called internally from the constructor - hence it can
be made private
- The name change of {{purgeLogsOlderThan}} to {{purgeTransactions}} seems to be a loss, since
it's no longer clear that the transactions to be purged have txid strictly less than the argument.
Can we rename to {{purgeTransactionsOlderThan}} instead?
-- the string status argument to {{mapJournalsAndReportErrors}} there should be updated to
match the new function name as well
- the addition of {{recoverUnclosedStreams}} in {{startLogSegment}} seems like the wrong spot.
Instead, I think it makes more sense to make this a public API of {{JournalManager}} and call
it (a) at startup, and (b) when we restore a journal manager which was previously marked failed
(i.e in {{JournalAndStream.startLogSegment}} only when {{stream == null}} before the call.
Later, we will probably want to make this asynchronous in some other thread, so the potentially
lengthy recovery doesn't block txns.
- what's the purpose of the {{inProgressCache}}? It seems a bit bug prone (static state scares
me), and the file will usually either get renamed to {{*.corrupt}} or finalized if you call
{{recoverUnclosedStreams}} first. I imagine the other functions like {{getInputStream}} and
{{getMaxLoadableTransactions}} will always be called _after_ the recovery step, and thus shouldn't
depend on the cache, right?
-- illustration of my above point: there's currently a bug such that {{countTransactionsInInprogress}}
will always return null the first time you call it on a file ({{putIfAbsent}} returns null
if it was previously absent). So, I also wonder whether there's sufficient testing of this
code path?
- in {{recoverUnclosedStreams}}, refactoring out the rename-to-.corrupt to a {{moveAsideCorruptFile}}
method as it was before would be nice. This could go in {{EditLogFile}} or a static method
in {{FileJournalManager}}, either way.
- when you moved FoundEditLog to EditLogFile, you retained {{file}} and {{lastTxId}} members
not being final, since the old {{finalizeLog}} method was mutating them. But, now I think
EditLogFile is completely immutable, right? So they could be made final again.
- not sure I follow why getNumberOfTransactions throws a CorruptionException if you've asked
for a transaction ID in the middle of a gap. Isn't it more consistent to just return 0? ie
that journal just wasn't being written to for that txid?
- for {{EditLogFile.getNumTransactions}} and {{EditLogValidation.getNumTransactions}}, it
seems like you should only return 0 if _both_ {{endTxId}} and {{startTxId}} are {{INVALID_TXID}}
-- even then, that seems like a kind of indeterminate state that would be produced by a bug.
When would we ever not know the {{startTxId}}? If the members were {{startTxId}} and {{numTransactions}}
then we wouldn't have the lack of clarity with what to do with the empty-log case.
- the new member {{FSImage.editsDirs}} is assigned but never read.

*style nits:*
- needless wrapping in {{recoverUnclosedStreams}} for assignment of {{elf}} var
- style:
      if (fromTxId > elf.startTxId
          && fromTxId <= elf.endTxId) {
should be formatted like:
      if ((fromTxId > elf.startTxId) &&
          (fromTxId <= elf.endTxId)) {
...to conform with other code
- JavaDoc style: first character after {{@param}} or {{@return}} should be lowercase, eg:
   * @return The number of transactions available from fromTxnId
...should have a lower case 'the'

*Potential split out patches*: maybe this could be separated into separate patches for some
of the following:
- refactor getEditLogManifest to call into the FileJournalManagers
- refactor FoundEditLog to new EditLogFile class in FileJournalManager
- remove use of LoadPlan and instead load from JournalManagers

*Overall questions*:
- what's the story on exclusive locking of journal managers after this? Do we still assume
that they'll be locked by FSImage? Or should we push down locking to the journal managers
as well?

> 1073: Move all journal stream management code into one place
> ------------------------------------------------------------
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 0.23.0
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff
> Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager
and the code for input streams is in the inspectors. This change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of transactions instead
of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message