hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tummy Bunny (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11820) Thread safety in logEdit?
Date Mon, 15 May 2017 22:54:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011461#comment-16011461

Tummy Bunny commented on HDFS-11820:

I beg you to read the code before commenting. The whole logic to get EditLog "singleton" instance
from cache and modify its attributes is outside synchornized block and outside writeLock.

> Thread safety in logEdit?
> -------------------------
>                 Key: HDFS-11820
>                 URL: https://issues.apache.org/jira/browse/HDFS-11820
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Tummy Bunny
> Hi there,
> I am new to Hadoop and trying to understand how things work under the hood by browsing
through some of the codes.
> I noticed a potential thread safety issue in in FSEditLog.java in version 2.7.1 where
the following patterns is used (the current trunk also use the same pattern):
> 1. Instance of FSEditLogOp is retrieved from cache for reuse 
> 2. Set the attributes (e.g. path, timestamp, etc)
> 3. Invoke logEdit(*op*). This method has synchronized block in it, but also has *wait*
if auto-sync is scheduled
> Now, if I have two almost simultaneous rename operations, right after each is about to
write edit log:
> Thread #1 acquired instance of RenameOp, set the attributes, and invoked logEdit, then
it waits because auto-sync is scheduled.
> Thread #2 catches up, and acquires same instance of RenameOp, sets *different* attributes,
and invokes logEdit.. It blocks because of synchronized block inside logEdit(...), but it
manages to modify the attributes of RenameOp.
> The second renameOp could end up being logged twice because both renameOps are actually
the same instance. 
> The fix is to have synchronized(*op*) prior to calling logEdit(*op*) or clone the op
before using it.
> I could be wrong. Am I missing something?
> Thanks,
> Alexander Koentjara

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message