spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (SPARK-14880) Parallel Gradient Descent with less map-reduce shuffle overhead
Date Mon, 25 Apr 2016 20:34:13 GMT

     [ https://issues.apache.org/jira/browse/SPARK-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joseph K. Bradley closed SPARK-14880.
-------------------------------------
    Resolution: Won't Fix

> Parallel Gradient Descent with less map-reduce shuffle overhead
> ---------------------------------------------------------------
>
>                 Key: SPARK-14880
>                 URL: https://issues.apache.org/jira/browse/SPARK-14880
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Ahmed Mahran
>              Labels: performance
>
> The current implementation of (Stochastic) Gradient Descent performs one map-reduce shuffle
per iteration. Moreover, when the sampling fraction gets smaller, the algorithm becomes shuffle-bound
instead of CPU-bound.
> {code}
> (1 to numIterations or convergence) {
>  rdd
>   .sample(fraction)
>   .map(Gradient)
>   .reduce(Update)
> }
> {code}
> A more performant variation requires only one map-reduce regardless from the number of
iterations. A local mini-batch SGD could be run on each partition, then the results could
be averaged. This is based on (Zinkevich, Martin, Markus Weimer, Lihong Li, and Alex J. Smola.
"Parallelized stochastic gradient descent." In Advances in neural information processing systems,
2010, http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf).
> {code}
> rdd
>  .shuffle()
>  .mapPartitions((1 to numIterations or convergence) {
>    iter.sample(fraction).map(Gradient).reduce(Update)
>  })
>  .reduce(Average)
> {code}
> A higher level iteration could enclose the above variation; shuffling the data before
the local mini-batches and feeding back the average weights from the last iteration. This
allows more variability in the sampling of the mini-batches with the possibility to cover
the whole dataset. Here is a Spark based implementation https://github.com/mashin-io/rich-spark/blob/master/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala
> {code}
> (1 to numIterations1 or convergence) {
>  rdd
>   .shuffle()
>   .mapPartitions((1 to numIterations2 or convergence) {
>     iter.sample(fraction).map(Gradient).reduce(Update)
>   })
>   .reduce(Average)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message