spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <>
Subject [jira] [Commented] (SPARK-4514) SparkContext localProperties does not inherit property updates across thread reuse
Date Tue, 31 Mar 2015 06:12:53 GMT


Josh Rosen commented on SPARK-4514:

I don't know that there's a good way to fix this for all arbitrary ways in which users might
create or re-use threads.  This inheritance behavior is slightly more understandable in cases
where users explicitly create child threads.  Although our documentation doesn't seem to explicitly
promise that properties will be inherited, I think that users might have come to rely on this
behavior so I don't think that we can remove it at this point.  We can certainly fix it for
the AsyncRDDActions case, though, because we can manually thread the properties in the constructor.

This pain could have probably been avoided if the original design used something like Scala's
{{DynamicVariable}} where you're forced to explicitly consider the scope / lifecycle of the
thread-local property.
I'm going to try to fix this for the AsyncRDDActions case and will try to improve the documentation
to warn about this pitfall for the more general cases involving arbitrary user code.  Let
me know if you can spot another solution which won't break existing user code that relies
on property inheritance in the non-thread-reuse cases.

> SparkContext localProperties does not inherit property updates across thread reuse
> ----------------------------------------------------------------------------------
>                 Key: SPARK-4514
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0, 1.1.1, 1.2.0
>            Reporter: Erik Erlandson
>            Assignee: Josh Rosen
>            Priority: Critical
> The current job group id of a Spark context is stored in the {{localProperties}} member
value.   This data structure is designed to be thread local, and its settings are not preserved
when {{ComplexFutureAction}} instantiates a new {{Future}}.  
> One consequence of this is that {{takeAsync()}} does not behave in the same way as other
async actions, e.g. {{countAsync()}}.  For example, this test (if copied into StatusTrackerSuite.scala),
will fail, because {{"my-job-group2"}} is not propagated to the Future which actually instantiates
the job:
> {code:java}
>   test("getJobIdsForGroup() with takeAsync()") {
>     sc = new SparkContext("local", "test", new SparkConf(false))
>     sc.setJobGroup("my-job-group2", "description")
>     sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq.empty)
>     val firstJobFuture = sc.parallelize(1 to 1000, 1).takeAsync(1)
>     val firstJobId = eventually(timeout(10 seconds)) {
>       firstJobFuture.jobIds.head
>     }
>     eventually(timeout(10 seconds)) {
>       sc.statusTracker.getJobIdsForGroup("my-job-group2") should be (Seq(firstJobId))
>     }
>   }
> {code}
> It also impacts current PR for SPARK-1021, which involves additional uses of {{ComplexFutureAction}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message