spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yadid Ayzenberg <>
Subject Re: spark performance non-linear response
Date Wed, 07 Oct 2015 15:42:26 GMT
Additional missing relevant information:

Im running a transformation, there are no Shuffles occurring and at the 
end im performing a lookup of 4 partitions on the driver.

On 10/7/15 11:26 AM, Yadid Ayzenberg wrote:
> Hi All,
> Im using spark 1.4.1 to to analyze a largish data set (several 
> Gigabytes of data). The RDD is partitioned into 2048 partitions which 
> are more or less equal and entirely cached in RAM.
> I evaluated the performance on several cluster sizes, and am 
> witnessing a non linear (power) performance improvement as the cluster 
> size increases (plot below). Each node has 4 cores and each worker is 
> configured to use 10GB or RAM.
> Spark performance
> I would expect a more linear response given the number of partitions 
> and the fact that all of the data is cached.
> Can anyone suggest what I should tweak in order to improve the 
> performance?
> Or perhaps provide an explanation as to the behavior Im witnessing?
> Yadid

View raw message