spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From r...@apache.org
Subject spark git commit: [SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section
Date Wed, 01 Jun 2016 19:57:10 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 beb4ea0b4 -> 47902d4bc


[SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator section

## What changes were proposed in this pull request?

Update document programming-guide accumulator section (scala language)
java and python version, because the API haven't done, so I do not modify them.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #13441 from WeichenXu123/update_doc_accumulatorV2_clean.

(cherry picked from commit 2402b91461146289a78d6cbc53ed8338543e6de7)
Signed-off-by: Reynold Xin <rxin@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47902d4b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47902d4b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/47902d4b

Branch: refs/heads/branch-2.0
Commit: 47902d4bc60561a2b0f4c7aadfdda394d4e78f75
Parents: beb4ea0
Author: WeichenXu <WeichenXu123@outlook.com>
Authored: Wed Jun 1 12:57:02 2016 -0700
Committer: Reynold Xin <rxin@databricks.com>
Committed: Wed Jun 1 12:57:07 2016 -0700

----------------------------------------------------------------------
 docs/programming-guide.md | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/47902d4b/docs/programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index d375926..70fd627 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -1352,41 +1352,42 @@ The code below shows an accumulator being used to add up the elements
of an arra
 <div data-lang="scala"  markdown="1">
 
 {% highlight scala %}
-scala> val accum = sc.accumulator(0, "My Accumulator")
-accum: org.apache.spark.Accumulator[Int] = 0
+scala> val accum = sc.longAccumulator("My Accumulator")
+accum: org.apache.spark.util.LongAccumulator = LongAccumulator(id: 0, name: Some(My Accumulator),
value: 0)
 
-scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
+scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum.add(x))
 ...
 10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s
 
 scala> accum.value
-res2: Int = 10
+res2: Long = 10
 {% endhighlight %}
 
-While this code used the built-in support for accumulators of type Int, programmers can also
-create their own types by subclassing [AccumulatorParam](api/scala/index.html#org.apache.spark.AccumulatorParam).
-The AccumulatorParam interface has two methods: `zero` for providing a "zero value" for your
data
-type, and `addInPlace` for adding two values together. For example, supposing we had a `Vector`
class
+While this code used the built-in support for accumulators of type Long, programmers can
also
+create their own types by subclassing [AccumulatorV2](api/scala/index.html#org.apache.spark.AccumulatorV2).
+The AccumulatorV2 abstract class has several methods which need to override: 
+`reset` for resetting the accumulator to zero, and `add` for add anothor value into the accumulator,
`merge` for merging another same-type accumulator into this one. Other methods need to override
can refer to scala API document. For example, supposing we had a `MyVector` class
 representing mathematical vectors, we could write:
 
 {% highlight scala %}
-object VectorAccumulatorParam extends AccumulatorParam[Vector] {
-  def zero(initialValue: Vector): Vector = {
-    Vector.zeros(initialValue.size)
+object VectorAccumulatorV2 extends AccumulatorV2[MyVector, MyVector] {
+  val vec_ : MyVector = MyVector.createZeroVector
+  def reset(): MyVector = {
+    vec_.reset()
   }
-  def addInPlace(v1: Vector, v2: Vector): Vector = {
-    v1 += v2
+  def add(v1: MyVector, v2: MyVector): MyVector = {
+    vec_.add(v2)
   }
+  ...
 }
 
 // Then, create an Accumulator of this type:
-val vecAccum = sc.accumulator(new Vector(...))(VectorAccumulatorParam)
+val myVectorAcc = new VectorAccumulatorV2
+// Then, register it into spark context:
+sc.register(myVectorAcc, "MyVectorAcc1")
 {% endhighlight %}
 
-In Scala, Spark also supports the more general [Accumulable](api/scala/index.html#org.apache.spark.Accumulable)
-interface to accumulate data where the resulting type is not the same as the elements added
(e.g. build
-a list by collecting together elements), and the `SparkContext.accumulableCollection` method
for accumulating
-common Scala collection types.
+Note that, when programmers define their own type of AccumulatorV2, the resulting type can
be same or not same with the elements added.
 
 </div>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message