hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tummy Bunny (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11820) Thread safety in logEdit?
Date Sat, 13 May 2017 13:57:04 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tummy Bunny updated HDFS-11820:
-------------------------------
    Description: 
Hi there,

I am new to Hadoop and trying to understand how things work under the hood by browsing through
some of the codes.

I noticed a potential thread safety issue in in FSEditLog.java in version 2.7.1 where the
following patterns is used (the current trunk also use the same pattern):
1. Instance of FSEditLogOp is retrieved from cache for reuse 
2. Set the attributes (e.g. path, timestamp, etc)
3. Invoke logEdit(op). This method has synchronized block in it, but also has "wait" if auto-sync
is scheduled.

Now, if I have two almost simultaneous rename operations, right after each is about to write
edit log:
Thread #1 acquired instance of RenameOp, set the attributes, and invoked logEdit, then it
waits because auto-sync is scheduled.
Thread #2 acquired same instance of RenameOp, set *different* attributes, and also invoked
logEdit.

The second renameOp could end up being logged twice because both renameOps are actually the
same instance. 
The fix is to have synchronized(anyCachedOp) { ... } prior to calling logEdit or clone the
op (use the cached instance as template).

I could be wrong. Am I missing something?

Thanks,

Alexander Koentjara

  was:
Hi there,

I am new to Hadoop and trying to understand how things work under the hood by browsing through
some of the codes.

I noticed a potential thread safety issue in in FSEditLog.java in version 2.7.1 where the
following patterns is used (the current trunk also use the same pattern):
1. Instance of FSEditLogOp is retrieved from cache for reuse 
2. Set the attributes (e.g. path, timestamp, etc)
3. Invoke logEdit(op). This method has synchronized block in it, but also has "wait" if auto-sync
is scheduled.

Now, if I have two almost simultaneous rename operations, right after each is about to write
edit log:
Thread #1 acquired instance of RenameOp, set the attributes, and invoked logEdit, then it
waits because auto-sync is scheduled.
Thread #2 acquired same instance of RenameOp, set *different* attributes, and also invoked
logEdit.

The second renameOp could end up being logged twice because both renameOps are actually the
same instance. 
The fix is to move synchronized(anyCachedOp) { ... } prior to calling logEdit or clone the
op (use the cached instance as template).

I could be wrong. Am I missing something?

Thanks,

Alexander Koentjara


> Thread safety in logEdit?
> -------------------------
>
>                 Key: HDFS-11820
>                 URL: https://issues.apache.org/jira/browse/HDFS-11820
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Tummy Bunny
>
> Hi there,
> I am new to Hadoop and trying to understand how things work under the hood by browsing
through some of the codes.
> I noticed a potential thread safety issue in in FSEditLog.java in version 2.7.1 where
the following patterns is used (the current trunk also use the same pattern):
> 1. Instance of FSEditLogOp is retrieved from cache for reuse 
> 2. Set the attributes (e.g. path, timestamp, etc)
> 3. Invoke logEdit(op). This method has synchronized block in it, but also has "wait"
if auto-sync is scheduled.
> Now, if I have two almost simultaneous rename operations, right after each is about to
write edit log:
> Thread #1 acquired instance of RenameOp, set the attributes, and invoked logEdit, then
it waits because auto-sync is scheduled.
> Thread #2 acquired same instance of RenameOp, set *different* attributes, and also invoked
logEdit.
> The second renameOp could end up being logged twice because both renameOps are actually
the same instance. 
> The fix is to have synchronized(anyCachedOp) { ... } prior to calling logEdit or clone
the op (use the cached instance as template).
> I could be wrong. Am I missing something?
> Thanks,
> Alexander Koentjara



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message