spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saluc <>
Subject PySpark: breakdown application execution time and fine-tuning
Date Sat, 17 Oct 2015 10:10:39 GMT

I am using PySpark to develop my big-data application. I have the impression
that most of the execution of my application is spent on the infrastructure
(distributing the code and the data in the cluster, IPC between the Python
processes and the JVM) rather than on the  computation itself. I would be
interested in particular in measuring the time spent in the IPC between the
Python processes and the JVM.

I would like to ask you, is there a way to breakdown the execution time in
order to have more details on how much time is effectively spent on the
different phases of the execution, so to have some kind of detailed
profiling of the execution time, and have more information for fine-tuning
the application?

Thank you very much for your help and support,

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message