spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Tian <tianyi.asiai...@gmail.com>
Subject Re: How to use multi thread in RDD map function ?
Date Sun, 28 Sep 2014 11:11:41 GMT
for yarn-client mode:
 
SPARK_EXECUTOR_CORES * SPARK_EXECUTOR_INSTANCES = 2(or 3) * TotalCoresOnYourCluster

for standlone mode:

SPARK_WORKER_INSTANCES * SPARK_WORKER_CORES = 2(or 3) * TotalCoresOnYourCluster



Best Regards,

Yi Tian
tianyi.asiainfo@gmail.com




On Sep 28, 2014, at 17:59, myasuka <myasuka@live.com> wrote:

> Hi, everyone
>    I come across with a problem about increasing the concurency. In a
> program, after shuffle write, each node should fetch 16 pair matrices to do
> matrix multiplication. such as:
> 
> *import breeze.linalg.{DenseMatrix => BDM}
> 
> pairs.map(t => {
>        val b1 = t._2._1.asInstanceOf[BDM[Double]]
>        val b2 = t._2._2.asInstanceOf[BDM[Double]]
> 
>        val c = (b1 * b2).asInstanceOf[BDM[Double]]
> 
>        (new BlockID(t._1.row, t._1.column), c)
>      })*
> 
>    Each node has 16 cores. However, no matter I set 16 tasks or more on
> each node, the concurrency cannot be higher than 60%, which means not every
> core on the node is computing. Then I check the running log on the WebUI,
> according to the amount of shuffle read and write in every task, I see some
> task do once matrix multiplication, some do twice while some do none.
> 
>    Thus, I think of using java multi thread to increase the concurrency. I
> wrote a program in scala which calls java multi thread without Spark on a
> single node, by watch the 'top' monitor, I find this program can use CPU up
> to 1500% ( means nearly every core are computing). But I have no idea how to
> use Java multi thread in RDD transformation.
> 
>    Is there any one can provide some example code to use Java multi thread
> in RDD transformation, or give any idea to increase the concurrency ?
> 
> Thanks for all
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message