spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe
Date Sun, 16 Jul 2017 13:00:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088920#comment-16088920
] 

Sean Owen commented on SPARK-21425:
-----------------------------------

I see, it's really Main. It looks like having the object declared as a static shared variable
makes the difference, though in principle that wouldn't matter. It could be specific to this
kind of setup, with local execution, but may still be an issue.

On the flip side, I don't see much downside to making writes on these thread-safe. If it's
necessary for correctness, well, it's necessary. If it's not, then it doesn't create contention
between threads, and at most this is paying a cost to acquire a lock to write (which might
be elided, but probably not in this case). CollectionAccumulator is already very nearly thread
safe anyway (minus setValue). At the moment it seems like that would be good practice, but
I am not 100% clear on why it was not created that way in the first place.

> LongAccumulator, DoubleAccumulator not threadsafe
> -------------------------------------------------
>
>                 Key: SPARK-21425
>                 URL: https://issues.apache.org/jira/browse/SPARK-21425
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Ryan Williams
>            Priority: Minor
>
> [AccumulatorV2 docs|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L42-L43]
acknowledge that accumulators must be concurrent-read-safe, but afaict they must also be concurrent-write-safe.
> The same docs imply that {{Int}} and {{Long}} meet either/both of these criteria, when
afaict they do not.
> Relatedly, the provided [LongAccumulator|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L291]
and [DoubleAccumulator|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L370]
are not thread-safe, and should be expected to behave undefinedly when multiple concurrent
tasks on the same executor write to them.
> [Here is a repro repo|https://github.com/ryan-williams/spark-bugs/tree/accum] with some
simple applications that demonstrate incorrect results from {{LongAccumulator}}'s.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message