hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Telloli (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5188) Modifications to enable multiple types of logging
Date Mon, 23 Feb 2009 18:14:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675998#action_12675998
] 

Luca Telloli commented on HADOOP-5188:
--------------------------------------

Konstantin> No I don't think logSync() method should be modified at least not the way you
did in your patch HADOOP-5189. Synchronization (locking) in the logSync() method prevents
different handler threads to write the same information (edits) multiple times into the same
edits file. Suppose you have 1 file stream (one edits file) and 2 threads call logSync() at
the same time. If you remove locking as you did in your patch then both threads may write
the same edits transactions into the same file twice. This will simply corrupt the edits file.



Konstantin> logSync() may be modified to write in parallel into different streams. If you
have 2 file streams then it makes sense to write into 2 files in parallel. But the information
should be written into each file only once and in the right order. And your patch breaks that,
afaiu.



I agree with you that the logSync() in the patch would not work with the file-based logging,
and that's why I put in my comments that this patch breaks it, but the purpose of the patch
is to work with BookKeeper! 



In the prototype included in the patch we do indeed remove synchronisation in logSync(), but
at the same time each thread has its own local buffer (a ThreadLocal variable) where it's
writing log modifications, and this guarantees that requests are processes in the order they
are received by the client, thus they do not violate the semantics of the file system. As
a proof I decided to run tests where I check the number of bytes written onto bookkeeper with
different logging systems and that number is the same which comforts me on this issue. 



Konstantin> My main objection to your approach though, is abstracting the whole EditsLog
class rather than just the EditLog*Streams. This leads to replication of a lot of code. Say,
both classes EditLog and BKEditLogThreadBuf have the loadFSEdits() methods, which are literally
identical, right? We should avoid that at all cost.



For BookKeeper integration *I really need to be able to override logSync()*, so I have to
keep going with the current abstract class unless a different way of overriding that method
is provided.  

In general, think that the abstract class has exactly the opposite effect of what you state,
allowing to place in the root class the common parts, but I agree that I might not have provided
you the cleanest code in this sense; I'll do some cleanup and post a new version. In the specific
case of loadFSEdits, if they are identical then you could remove the code from BKEditLogThreadBuf.




I also think that the abstract classes for Input/Output streams are cool and they do not conflict
with having an abstract class EditLog; in each test prototype in fact we used them


I'll try to have a patch that applies against trunk; and I'd also like to try to address dhruba
request of multiple logging systems, maybe I could try with HADOOP-4539? Is that patch supporting
multiple logging through multiple instances of the EditLogInput/OutputStream classes? I'd
like to avoid duplicating efforts but, in my case, I really need to be able to Override the
current logSync(). 


> Modifications to enable multiple types of logging 
> --------------------------------------------------
>
>                 Key: HADOOP-5188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5188
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: HADOOP-5188.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message