spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ulanov, Alexander" <alexander.ula...@hp.com>
Subject Maximum size of vector that reduce can handle
Date Fri, 23 Jan 2015 18:00:05 GMT
Dear Spark developers,

I am trying to measure the Spark reduce performance for big vectors. My motivation is related
to machine learning gradient. Gradient is a vector that is computed on each worker and then
all results need to be summed up and broadcasted back to workers. For example, present machine
learning applications involve very long parameter vectors, for deep neural networks it can
be up to 2Billions. So, I want to measure the time that is needed for this operation depending
on the size of vector and number of workers. I wrote few lines of code that assume that Spark
will distribute partitions among all available workers. I have 6-machine cluster (Xeon 3.3GHz
4 cores, 16GB RAM), each runs 2 Workers.

import org.apache.spark.mllib.rdd.RDDFunctions._
import breeze.linalg._
import org.apache.log4j._
Logger.getRootLogger.setLevel(Level.OFF)
val n = 60000000
val p = 12
val vv = sc.parallelize(0 until p, p).map(i => DenseVector.rand[Double]( n ))
vv.reduce(_ + _)

When executing in shell with 60M vector it crashes after some period of time. One of the node
contains the following in stdout:
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000755500000, 2863661056,
0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 2863661056 bytes for committing reserved
memory.

I run shell with --executor-memory 8G --driver-memory 8G, so handling 60M vector of Double
should not be a problem. Are there any big overheads for this? What is the maximum size of
vector that reduce can handle? 

Best regards, Alexander

P.S. 

"spark.driver.maxResultSize 0" needs to set in order to run this code. I also needed to change
"java.io.tmpdir" and "spark.local.dir" folders because my /tmp folder which is default, was
too small and Spark swaps heavily into this folder. Without these settings I get either "no
space left on device" or "out of memory" exceptions.

I also submitted a bug https://issues.apache.org/jira/browse/SPARK-5386

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message