hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1799) Refactor log rolling and filename management out of FSEditLog
Date Tue, 03 May 2011 09:03:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028114#comment-13028114

Ivan Kelly commented on HDFS-1799:

I agree that your design avoids introducing a new class by collapsing the file management
(ie "log lifecycle") code into EditLogOutputStream. However, I don't feel like "fewer code
changes" or "fewer classes" are particularly appropriate design goals. 

More code changes and more classes indicate that more complexity is being added which I think
is what is happening here. Instead of FSEditLog being a manager of log objects, it becomes
a manager of managers of log objects. More complexity is fine as long as it adds clarity to
the design, or the ability to do things previously not possible. I don't think JournalManager
adds either.

The nice thing about this design is that EditLogOutputStream needs no conception of StorageDirectories,
for example. In fact, it is barely coupled to HDFS at all with the exception of using a few
static constants.

In contrast, your design collapses layout management into the output stream, thus making EditLogFileOutputStream
depend on StorageDirectory, NNStorage, etc.

I don't see any harm in EditLogFileOutputStream knowing about StorageDirectory and NNStorage
as long as this doesn't pollute EditLogOutputStream. It is the implementation of EditLogOutputStream
for files after all, and files in HDFS should be dealt with through StorageDirectory.

My design only collapses the API into the output stream, not the actual implement. #nextSegment()
could make a call to a JournalManager/Journal/JournalFactory which would actually do the file
management. Collapsing the API isn't that different to what happens with JournalManager anyhow,
as all accesses to the stream go through the manager.

Lastly, I want point out that HDFS-1580 will require a class like this anyway (called Journal
in your design doc}}. Though this current patch doesn't address it, it will be a clear extension
of JournalManager to add the input-side calls, the purging calls, etc.
The difference between JournalManager and Journal/JournalFactory in HDFS-1580 is one of persistence.

For output, JournalManager objects must be created and held by FSEditLog for the duration
of it's existence. Journal/JournalFactory objects only exists for the creation of the stream
after which the object can go out of scope/be GC'd (unless the creates streams keep a reference).

Clearly we have a difference of opinion on this design, but could you please indicate how
strong your objections are? i.e. are you -1ing this design or just proposing another option?
Given that I already have a bunch of work lined up (and blocked) behind this, I'd really like
to close this out in the next day or two.
I'm -1. I really don't like the call to getCurrentStream() having to be called every time
you want an output stream. However if the concensus goes the other way I'll acquiesce. 

> Refactor log rolling and filename management out of FSEditLog
> -------------------------------------------------------------
>                 Key: HDFS-1799
>                 URL: https://issues.apache.org/jira/browse/HDFS-1799
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: 0001-Added-state-management-to-FSEditLog.patch, 0002-Standardised-error-pattern.patch,
0003-Add-JournalFactory-and-move-divert-revert-out-of-FSE.patch, HDFS-1799-all.diff, hdfs-1799.txt,
hdfs-1799.txt, hdfs-1799.txt, hdfs-1799.txt
> This is somewhat similar to HDFS-1580, but less ambitious. While that JIRA focuses on
pluggability, this task is simply the minimum needed for HDFS-1073:
> - Refactor the filename-specific code for rolling, diverting, and reverting log streams
out of FSEditLog into a new class
> - Clean up the related code in FSEditLog a bit
> Notably, this JIRA is going to temporarily break the BackupNode. I plan to circle back
on the BackupNode later on this branch.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message