hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1799) Refactor log rolling and filename management out of FSEditLog
Date Sat, 23 Apr 2011 11:18:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023546#comment-13023546

Ivan Kelly commented on HDFS-1799:

    The concept of an EditLogOutputStream used to have a nice one-shot property just like
Java output streams. That is to say, they were created with a particular file, written to,
and then closed. Once closed, you need to make a new instance to start writing to a new file.
I think this is much better than having a single instance which moves its output from place
to place while rolling.

The design I'd done above in this JIRA (the attachment from 4/1) keeps a clean separation
between single outputstreams (corresponding to one file) and a different class (JournalManager)
which takes care of "log lifecycle" issues like rolling, etc.

The reason I prefer the latter design is that it's harder to write bugs - once a stream has
been closed, if someone tries to write to that same instance, we'll get a clear exception.
This should help protect against race condition bugs (we've had lots of these in the past
in this area of the code, and they're quite dangerous)
The JournalManager doesn't actually preserve this nice one-shot property as you never actually
work directly with the EditLogOutputStream. You always access it though the JournalManager.

JournalManager just seems to hide EditLogOutputStream from FSEditLog, and using it means FSEditLog
needs to be touched in lots of places, not just rolling. 

    The new APIs "beginRoll", "isRolling", and "endRoll" aren't clear to me. I guess they
correspond to the state transition of "divert", "isDiverted", and "revert" in the current
implementation, but I don't see how they're any more general.
They're not any more generic. These were only to exist while edits.new still exists. Ones
the sequential naming for files came in, they could be replaced with a single call to a roll()
method. This could be implemented in a way to return a new EditLogOutputStream. I think this
would give the editlog output stream the one shot property you mentioned above.
class EditLogOutputStream {
    * Create a new EditLogOutputStream pointing to edits.new
    * in the storage directory of the current stream
   EditLogOutputStream newStream();
    * Redirect edits.new to edits
   void endRoll();

In contrast, the patch I posted on 4/1 is more like a straight refactor and keeps the same
terminology. When we do the bigger switchover to the new filenames, it makes sense to switch
the API names at that point in my opinion.

Would you be OK with keeping this JIRA as a pretty clean refactor (like my 4/1 patch)? I'll
try to post a preliminary version of the next patch in the sequence tonight or tomorrow so
you can see how the API changes for the sequenced log files.
This isn't just a clean refactor. It sets the direction for future work on the generic interface.
I think some input from the other guys would be useful before proceding to avoid this JIRA
ending up like HDFS-311.

> Refactor log rolling and filename management out of FSEditLog
> -------------------------------------------------------------
>                 Key: HDFS-1799
>                 URL: https://issues.apache.org/jira/browse/HDFS-1799
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: 0001-Added-state-management-to-FSEditLog.patch, 0002-Standardised-error-pattern.patch,
0003-Add-JournalFactory-and-move-divert-revert-out-of-FSE.patch, HDFS-1799-all.diff, hdfs-1799.txt,
hdfs-1799.txt, hdfs-1799.txt
> This is somewhat similar to HDFS-1580, but less ambitious. While that JIRA focuses on
pluggability, this task is simply the minimum needed for HDFS-1073:
> - Refactor the filename-specific code for rolling, diverting, and reverting log streams
out of FSEditLog into a new class
> - Clean up the related code in FSEditLog a bit
> Notably, this JIRA is going to temporarily break the BackupNode. I plan to circle back
on the BackupNode later on this branch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message