hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Clampffer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11436) libhdfs++: Fix race condition in ScopedResolver
Date Wed, 22 Feb 2017 17:30:44 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Clampffer updated HDFS-11436:
-----------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Thanks for the review [~xiaowei.zhu], I committed this to HDFS-8707.

Due to the non-deterministic nature of this bug testing was done using the application where
the failures were discovered rather than the cancel examples where there wasn't enough concurrency
to cause the race.

> libhdfs++: Fix race condition in ScopedResolver
> -----------------------------------------------
>
>                 Key: HDFS-11436
>                 URL: https://issues.apache.org/jira/browse/HDFS-11436
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-11436.HDFS-8707.000.patch
>
>
> ScopedResolver holds a shared_ptr to a std::promise to do async stuff while the calling
thread blocks.  The spec allows promise::set_value to unblock future::wait immediately which
causes future::get_value to race to read whatever the promise contains before the state and
value owned by the promise is destroyed.  In this case even though the ScopedResolver holds
a shared_ptr to the promise which looks like it'd prevent this but it's possible for wait
to unblock fast enough for the destructor on ScopedResolver to be called which ends up destroying
the promise.  GCC's implementation of promises and futures will hit this and will show up
in valgrind as a bunch of invalid reads if the timings are right to set up the race. 
> Simple fix right now is make sure the callback captures a shared_ptr to the promise.
 Longer term fix may be to make a class that encapsulates the promise and future with a reference
counting mechanism to prevent more of these bugs and use that instead of the std lib versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message