hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5188) Modifications to enable multiple types of logging
Date Wed, 04 Mar 2009 03:18:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678564#action_12678564
] 

Konstantin Shvachko commented on HADOOP-5188:
---------------------------------------------

I think you are trying to substitute EditLogOutputStream abstraction with an EditLog abstraction.
Will try to explain:
- FSImage object deals (or supposed to deal) with everything related to the file system persistent
image.
- FSEditLog should be dealing with everything related to journaling.
- There are different ways of journaling and this should be reflected by an abstract EditLogOutputStream
class.

Your approach will lead to that FSImage will have an array of EditLog(s). And you will have
to introduce {{FSImage.logSync()}} method on it so that it would loop over all EditLogs and
call their respective {{EditLog.logSync()}} methods. But this is exactly what current {{EditLog.logSync()}}
method does. It loops through the EditLogOutputStreams and calls flushAndSync() on them. The
same with other operations: logEdit(), processIOError().

So the idea is that EditLog should combine common logic for all journaling streams (logging
types). The specifics of journaling should be contained within implementations of  EditLogOutputStream.

I agree with Ben - FSEditLog was originally written for file based journals and still contains
code specific to this type. And it may be optimized.
I can see that waiting for the whole batch of edits to complete makes the bookKeeper stream
less efficient.
But that does not mean that FSEditLog should be overloaded; it just means that one method
logSync() should be generalized to allow efficient implementation of BK streams *as well as*
other (file and backup) streams.

So the proposal. Lets put an if statement in logSync() for now, which checks whether all streams
are bookKeeper streams and then if it is it does not go into synchronized sections in logSync()
(avoids waiting) or alternatively calls a BK specific method.
I say this because if the name-node uses BK logging together with other logging types then
logging will go with the speed of the slowest journal. So the only case when BK can benefit
from optimized logSync() is when there are now other than BK types of streams.
Hope that makes sense.

> Modifications to enable multiple types of logging 
> --------------------------------------------------
>
>                 Key: HADOOP-5188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5188
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: HADOOP-5188.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message