hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc
Date Tue, 18 Sep 2012 21:31:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-3950:
------------------------------

    Attachment: hdfs-3950.txt

- Removes hardcoded timeout for attaining a quorum to write transactions. Now configurable
(default still 20sec)
- Change stringification of QuorumJournalManager so that the web UI readout doesn't end up
so wide. We used to print the URI, which was very wide. Now there is a ", "-separated list
of addresses, so it's able to wrap to multiple lines and display nicer. Had to update a unit
test or two for this.
- Change the buffer capacity for the QuorumOutputStream to match the behavior of EditLogFileOutputStream
(ie respects FSEditLog.setOutputBufferCapacity())

- Removed TODO:
{code}
-    // TODO: check that md5s match up between any "tied" logs
{code}

We removed the md5sum field in HDFS-3943. When we add it back, we can add a sanity check like
this.

- Removed a couple TODOs which I replaced with comments rationalizing why the current behavior
does in fact work.

- Reduced verbose logging during newEpoch(). The verbose logging of newEpoch() responses is
now at DEBUG level, with a less verbose one at INFO level.

- Removed a bunch of unused imports in various files.

- Replace use of deprecated RPC.getServer with the new Builder interface from Common.

- Address some TODOs in {{Journal.checkRequest}}. These are the most interesting non-trivial
changes from this patch:
-- Maintains the current IPC serial number and performs sanity checks that they only increase
in a given epoch. This is defensive against bugs in the IPC layer, and also would defend against
a potential bug where multiple writers got assigned the same epoch.
-- Whenever we get an RPC from a new epoch (higher than lastPromisedEpoch), we treat that
as an explicit "promise" not to accept lower ones. This helps tighten our sanity checks -
we used to only assign lastPromisedEpoch as part of the {{newEpoch()}} change, and strictly
that's all that's necessary. But re-assigning it on any higher-epoched RPC is extra-defensive.

- Include the client IP address in some of the more important INFO messages.

- Remove stale TODO:
{code}
-    // TODO: right now, a recovery of a segment when the log is
-    // completely emtpy (ie startLogSegment() but no txns)
-    // will fail this assertion here, since endTxId < startTxId
{code}
There are lots of tests for this circumstance now - it's been long since fixed.

- Adds a few new sanity checks that I thought of while reviewing the code.

- Adds a fault injection point between where a logger downloads a log segment and then persists
the metadata about that log segment. I had a hunch there might be a bug here, but it is successfully
passing the tests, so I think it turned out to not be a problem. The new fault injection point
uses the same strategy as CheckpointFaultInjector.

- Improves {{PersistentLongFile}} to not re-write the file when the value has not changed.

I ran this through my cluster fault injection test and it passed. I also ran findbugs and
there are no issues found. Ran the full unit test suite for qjournal and it passed.
                
> QJM: misc TODO cleanup, improved log messages, etc
> --------------------------------------------------
>
>                 Key: HDFS-3950
>                 URL: https://issues.apache.org/jira/browse/HDFS-3950
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: hdfs-3950.txt
>
>
> General JIRA for a bunch of miscellaneous clean-up in the QJM branch:
> - fix most remaining TODOs
> - improve some log/error messages
> - add some more sanity checks where appropriate
> - address any findbugs that might have crept into branch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message