hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1799) Refactor log rolling and filename management out of FSEditLog
Date Fri, 22 Apr 2011 05:55:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023124#comment-13023124

Todd Lipcon commented on HDFS-1799:

I've taken the first two patches that you broke out and committed them to the branch as HDFS-1858
and HDFS-1859.

Ivan: looking at the third in your patch series, I think there are a couple problems with
the design:

- The concept of an EditLogOutputStream used to have a nice one-shot property just like Java
output streams. That is to say, they were created with a particular file, written to, and
then closed. Once closed, you need to make a new instance to start writing to a new file.
I think this is much better than having a single instance which moves its output from place
to place while rolling.

The design I'd done above in this JIRA (the attachment from 4/1) keeps a clean separation
between single outputstreams (corresponding to one file) and a different class (JournalManager)
which takes care of "log lifecycle" issues like rolling, etc.

The reason I prefer the latter design is that it's harder to write bugs - once a stream has
been closed, if someone tries to write to that same instance, we'll get a clear exception.
This should help protect against race condition bugs (we've had lots of these in the past
in this area of the code, and they're quite dangerous)

- Your patch also introduces several new APIs which are yet to be used (eg getURI, getInputStream,
etc). I know they're there in your design doc, but I've been striving to keep unused APIs
out of the work-in-progress branch - otherwise it's too easy to fall into traps in my opinion.
Without the code that uses these APIs, I'm not convinced that they're the right ones to build

- The new APIs "beginRoll", "isRolling", and "endRoll" aren't clear to me. I guess they correspond
to the state transition of "divert", "isDiverted", and "revert" in the current implementation,
but I don't see how they're any more general.

In contrast, the patch I posted on 4/1 is more like a straight refactor and keeps the same
terminology. When we do the bigger switchover to the new filenames, it makes sense to switch
the API names at that point in my opinion.

Would you be OK with keeping this JIRA as a pretty clean refactor (like my 4/1 patch)? I'll
try to post a preliminary version of the next patch in the sequence tonight or tomorrow so
you can see how the API changes for the sequenced log files.

> Refactor log rolling and filename management out of FSEditLog
> -------------------------------------------------------------
>                 Key: HDFS-1799
>                 URL: https://issues.apache.org/jira/browse/HDFS-1799
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: 0001-Added-state-management-to-FSEditLog.patch, 0002-Standardised-error-pattern.patch,
0003-Add-JournalFactory-and-move-divert-revert-out-of-FSE.patch, HDFS-1799-all.diff, hdfs-1799.txt,
> This is somewhat similar to HDFS-1580, but less ambitious. While that JIRA focuses on
pluggability, this task is simply the minimum needed for HDFS-1073:
> - Refactor the filename-specific code for rolling, diverting, and reverting log streams
out of FSEditLog into a new class
> - Clean up the related code in FSEditLog a bit
> Notably, this JIRA is going to temporarily break the BackupNode. I plan to circle back
on the BackupNode later on this branch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message