spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Hussam_Jar...@Dell.com>
Subject RE: squestion on using spark parallelism vs using num partitions in spark api
Date Tue, 14 Jan 2014 18:35:20 GMT
I am using local

Thanks,
Hussam

From: Huangguowei [mailto:huangguowei@huawei.com]
Sent: Tuesday, January 14, 2014 4:43 AM
To: user@spark.incubator.apache.org
Subject: 答复: squestion on using spark parallelism vs using num partitions in spark api

“Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node”

Local or standalone(single node) ?

发件人: leosandylh@gmail.com<mailto:leosandylh@gmail.com> [mailto:leosandylh@gmail.com]
发送时间: 2014年1月14日 13:42
收件人: user
主题: Re: squestion on using spark parallelism vs using num partitions in spark api

I think the parallelism param just control how many tasks could be run together in each work.
it could't control how many tasks should be split .

________________________________
leosandylh@gmail.com<mailto:leosandylh@gmail.com>

From: Hussam_Jarada@Dell.com<mailto:Hussam_Jarada@Dell.com>
Date: 2014-01-14 09:17
To: user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>
Subject: squestion on using spark parallelism vs using num partitions in spark api
Hi,

Using spark 0.8.1 … jave code running on 8 CPU with 16GRAM single node

It’s looks like upon setting spark parallelism using System.setProperty("spark.default.parallelism",
24) before creating my spark context as described in http://spark.incubator.apache.org/docs/latest/tuning.html#level-of-parallelism
has no effect on the default number of partitions that spark uses in its api’s like saveAsTextFile()
.

For example if I set spark.default.parallelism to 24, I was expecting 24 tasks to be invoked
upon calling saveAsTextFile() but it’s not the case as I am seeing only 1 task get invoked

If I set my RDD parallelize() to 2 as
dataSetRDD = SparkDriver.getSparkContext().parallelize(mydata,2);
then invoke
dataSetRDD.saveAsTextFile(JavaRddFilePath);

I am seeing 2 tasks get invoked even my spark.default.parallelism was set to 24

Can someone explain the above behavior?

Thanks,
Hussam
Mime
View raw message