spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3051) Support looking-up named accumulators in a registry
Date Wed, 17 Sep 2014 22:11:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138095#comment-14138095
] 

Apache Spark commented on SPARK-3051:
-------------------------------------

User 'nfergu' has created a pull request for this issue:
https://github.com/apache/spark/pull/2438

> Support looking-up named accumulators in a registry
> ---------------------------------------------------
>
>                 Key: SPARK-3051
>                 URL: https://issues.apache.org/jira/browse/SPARK-3051
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Neil Ferguson
>
> This is a proposed enhancement to Spark based on the following mailing list discussion:
http://apache-spark-developers-list.1001551.n3.nabble.com/quot-Dynamic-variables-quot-in-Spark-td7450.html.
> This proposal builds on SPARK-2380 (Support displaying accumulator values in the web
UI) to allow named accumulables to be looked-up in a "registry", as opposed to having to be
passed to every method that need to access them.
> The use case was described well by [~shivaram], as follows:
> Lets say you have two functions you use 
> in a map call and want to measure how much time each of them takes. For 
> example, if you have a code block like the one below and you want to 
> measure how much time f1 takes as a fraction of the task. 
> {noformat}
> a.map { l => 
>    val f = f1(l) 
>    ... some work here ... 
> } 
> {noformat}
> It would be really cool if we could do something like 
> {noformat}
> a.map { l => 
>    val start = System.nanoTime 
>    val f = f1(l) 
>    TaskMetrics.get("f1-time").add(System.nanoTime - start) 
> } 
> {noformat}
> SPARK-2380 provides a partial solution to this problem -- however the accumulables would
still need to be passed to every function that needs them, which I think would be cumbersome
in any application of reasonable complexity.
> The proposal, as suggested by [~pwendell], is to have a "registry" of accumulables, that
can be looked-up by name. 
> Regarding the implementation details, I'd propose that we broadcast a serialized version
of all named accumulables in the DAGScheduler (similar to what SPARK-2521 does for Tasks).
These can then be deserialized in the Executor. 
> Accumulables are already stored in thread-local variables in the Accumulators object,
so exposing these in the registry should be simply a matter of wrapping this object, and keying
the accumulables by name (they are currently keyed by ID).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message