spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject A couple questions about shared variables
Date Sat, 20 Sep 2014 15:50:26 GMT
Hey All,

A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.

*Broadcast variables*
Now that tasks data is automatically broadcast, the only occasions where it
makes sense to explicitly broadcast are:
* You want to use a variable from tasks in multiple stages.
* You want to have the variable stored on the executors in deserialized
form.
* You want tasks to be able to modify the variable and have those
modifications take effect for other tasks running on the same executor
(usually a very bad idea).

Is that right?

*Accumulators*
Values are only counted for successful tasks.  Is that right?  KMeans seems
to use it in this way.  What happens if a node goes away and successful
tasks need to be resubmitted?  Or the stage runs again because a different
job needed it.

thanks,
Sandy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message