spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Quick question on spark performance
Date Sat, 21 May 2016 00:59:12 GMT
It's probably due to GC.

On Fri, May 20, 2016 at 5:54 PM, Yash Sharma <yash360@gmail.com> wrote:

> Hi All,
> I am here to get some expert advice on a use case I am working on.
>
> Cluster & job details below -
>
> Data - 6 Tb
> Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps)
>
> Parameters-
> --executor-memory 10G \
> --executor-cores 6 \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.dynamicAllocation.initialExecutors=15 \
>
> Runtime : 3 Hrs
>
> On monitoring the metrics I notices 10G for executors is not required
> (since I don't have lot of groupings)
>
> Reducing to --executor-memory 3G, Runtime reduced to: 2 Hrs
>
> Question:
> On adding more nodes now has absolutely no effect on the runtime. Is there
> anything I can tune/change/experiment with to make the job faster.
>
> Workload: Mostly reduceBy's and scans.
>
> Would appreciate any insights and thoughts. Best Regards
>
>
>

Mime
View raw message