hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1799) Refactor log rolling and filename management out of FSEditLog
Date Fri, 29 Apr 2011 11:51:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026940#comment-13026940

Ivan Kelly commented on HDFS-1799:

My primary criticism of this patch is that it introduces the JournalManager interface which
is unnecessary in my view. It creates a intermediatory layer between FSEditLog and OutputStream.
In one sense this is a straight refactor. You've just moved revert and divert into a different
class. However, this straight refactor has required that you change almost every method in
FSEditLog to use a list of JournalManagers instead of EditLogOutputStreams. 

JournalManager currently deals only with output. Its not clear how input of the streams will
be handled. open(), close(), abort(), restore(), getCurrentStream(), setBufferCapacity() are
all output only. How do you plan to add input methods to this? If you don't, then you've added
a new abstraction for output streams, which should be rolled directly into output stream.

Lastly, this patch doesn't hide the fact that files are in use here. FSEditLog still knows
that it's using edits and edits.new. Are different filenames ever used? The only other place
I know of is the Backup spool, which is disabled on this branch.

In summary, this patch...
   1. Changes too much code.
   2. Unclear how input functions will be added in the future.
   3. Doesn't hide the fact that files are still in use.

Sorry for being a pain in the ass about this. I expect you're quite frustrated by it. I do
think there needs to be unified view of how this will look in the future though, as getting
the right abstractions now will prevent a lot of pain later. Regarding LogSegment, I'm still
on the fence about this. I see some of the merits in it, and I can see why JournalMananger
is possibly needed for it. If you retain the list of EditLogOutputStreams as it is now, you'll
have to maintain a separate list of some other objects for the creation of new log segments.
This could be problematic as these lists need to be kept in sync for error handling etc. 

> Refactor log rolling and filename management out of FSEditLog
> -------------------------------------------------------------
>                 Key: HDFS-1799
>                 URL: https://issues.apache.org/jira/browse/HDFS-1799
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: 0001-Added-state-management-to-FSEditLog.patch, 0002-Standardised-error-pattern.patch,
0003-Add-JournalFactory-and-move-divert-revert-out-of-FSE.patch, HDFS-1799-all.diff, hdfs-1799.txt,
hdfs-1799.txt, hdfs-1799.txt, hdfs-1799.txt
> This is somewhat similar to HDFS-1580, but less ambitious. While that JIRA focuses on
pluggability, this task is simply the minimum needed for HDFS-1073:
> - Refactor the filename-specific code for rolling, diverting, and reverting log streams
out of FSEditLog into a new class
> - Clean up the related code in FSEditLog a bit
> Notably, this JIRA is going to temporarily break the BackupNode. I plan to circle back
on the BackupNode later on this branch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message