hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc
Date Tue, 18 Sep 2012 21:31:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-3950:

    Attachment: hdfs-3950.txt

- Removes hardcoded timeout for attaining a quorum to write transactions. Now configurable
(default still 20sec)
- Change stringification of QuorumJournalManager so that the web UI readout doesn't end up
so wide. We used to print the URI, which was very wide. Now there is a ", "-separated list
of addresses, so it's able to wrap to multiple lines and display nicer. Had to update a unit
test or two for this.
- Change the buffer capacity for the QuorumOutputStream to match the behavior of EditLogFileOutputStream
(ie respects FSEditLog.setOutputBufferCapacity())

- Removed TODO:
-    // TODO: check that md5s match up between any "tied" logs

We removed the md5sum field in HDFS-3943. When we add it back, we can add a sanity check like

- Removed a couple TODOs which I replaced with comments rationalizing why the current behavior
does in fact work.

- Reduced verbose logging during newEpoch(). The verbose logging of newEpoch() responses is
now at DEBUG level, with a less verbose one at INFO level.

- Removed a bunch of unused imports in various files.

- Replace use of deprecated RPC.getServer with the new Builder interface from Common.

- Address some TODOs in {{Journal.checkRequest}}. These are the most interesting non-trivial
changes from this patch:
-- Maintains the current IPC serial number and performs sanity checks that they only increase
in a given epoch. This is defensive against bugs in the IPC layer, and also would defend against
a potential bug where multiple writers got assigned the same epoch.
-- Whenever we get an RPC from a new epoch (higher than lastPromisedEpoch), we treat that
as an explicit "promise" not to accept lower ones. This helps tighten our sanity checks -
we used to only assign lastPromisedEpoch as part of the {{newEpoch()}} change, and strictly
that's all that's necessary. But re-assigning it on any higher-epoched RPC is extra-defensive.

- Include the client IP address in some of the more important INFO messages.

- Remove stale TODO:
-    // TODO: right now, a recovery of a segment when the log is
-    // completely emtpy (ie startLogSegment() but no txns)
-    // will fail this assertion here, since endTxId < startTxId
There are lots of tests for this circumstance now - it's been long since fixed.

- Adds a few new sanity checks that I thought of while reviewing the code.

- Adds a fault injection point between where a logger downloads a log segment and then persists
the metadata about that log segment. I had a hunch there might be a bug here, but it is successfully
passing the tests, so I think it turned out to not be a problem. The new fault injection point
uses the same strategy as CheckpointFaultInjector.

- Improves {{PersistentLongFile}} to not re-write the file when the value has not changed.

I ran this through my cluster fault injection test and it passed. I also ran findbugs and
there are no issues found. Ran the full unit test suite for qjournal and it passed.
> QJM: misc TODO cleanup, improved log messages, etc
> --------------------------------------------------
>                 Key: HDFS-3950
>                 URL: https://issues.apache.org/jira/browse/HDFS-3950
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: hdfs-3950.txt
> General JIRA for a bunch of miscellaneous clean-up in the QJM branch:
> - fix most remaining TODOs
> - improve some log/error messages
> - add some more sanity checks where appropriate
> - address any findbugs that might have crept into branch

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message