spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yotto Koga <yotto.k...@autodesk.com>
Subject RE: configure to run multiple tasks on a core
Date Thu, 27 Nov 2014 01:01:07 GMT
Thanks Sean. That worked out well.

For anyone who happens onto this post and wants to do the same, these are the steps I took
to do as Sean suggested...

(Note this is for a stand alone cluster)

login to the master

~/spark/sbin/stop-all.sh

edit ~/spark/conf/spark-env.sh

modify the line
export SPARK_WORKER_INSTANCES=1
to the multiple you want to set (e.g 2)

I also added
export SPARK_WORKER_MEMORY=some reasonable value so that the total number of workers on a
node is within the available memory available on the node (e.g. 2g)

~/spark-ec2/copy-dir /root/spark/conf

~/spark/sbin/start-all.sh


________________________________________
From: Sean Owen [sowen@cloudera.com]
Sent: Wednesday, November 26, 2014 12:14 AM
To: Yotto Koga
Cc: user@spark.apache.org
Subject: Re: configure to run multiple tasks on a core

What about running, say, 2 executors per machine, each of which thinks
it should use all cores?

You can also multi-thread your map function manually, directly, within
your code, with careful use of a java.util.concurrent.Executor

On Wed, Nov 26, 2014 at 6:57 AM, yotto <yotto.koga@autodesk.com> wrote:
> I'm running a spark-ec2 cluster.
>
> I have a map task that calls a specialized C++ external app. The app doesn't
> fully utilize the core as it needs to download/upload data as part of the
> task. Looking at the worker nodes, it appears that there is one task with my
> app running per core.
>
> I'd like to better utilize the cpu resources with the hope of increasing
> throughput by running multiple tasks (with my app) per core in parallel.
>
> I see there is a spark.task.cpus config setting with a default value of 1.
> It appears though that this is used to go the other way than what I am
> looking for.
>
> Is there a way where I can specify multiple tasks per core rather than
> multiple cores per task?
>
> thanks for any help.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/configure-to-run-multiple-tasks-on-a-core-tp19834.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message