spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <saisai.s...@intel.com>
Subject RE: [spark-streaming] can shuffle write to disk be disabled?
Date Thu, 19 Mar 2015 00:58:33 GMT
Please see the inline comments.

Thanks
Jerry

From: Darren Hoo [mailto:darren.hoo@gmail.com]
Sent: Wednesday, March 18, 2015 9:30 PM
To: Shao, Saisai
Cc: user@spark.apache.org; Akhil Das
Subject: Re: [spark-streaming] can shuffle write to disk be disabled?



On Wed, Mar 18, 2015 at 8:31 PM, Shao, Saisai <saisai.shao@intel.com<mailto:saisai.shao@intel.com>>
wrote:

>From the log you pasted I think this (-rw-r--r--  1 root root  80K Mar 18 16:54 shuffle_47_519_0.data)
is not shuffle spilled data, but the final shuffle result.

why the shuffle result  is written to disk?

This is the internal mechanism for Spark.



As I said, did you think shuffle is the bottleneck which makes your job running slowly?

I am quite new to spark, So I am just doing wild guesses. which information should I provide
further that
can help to find the real bottleneck?

You can monitor the system metrics, as well as JVM, also information from web UI is very useful.



Maybe you should identify the cause at first. Besides from the log it looks your memory is
not enough the cache the data, maybe you should increase the memory size of the executor.



 running two executors, the memory ussage is quite low:

executor 0  8.6 MB / 4.1 GB
executor 1  23.9 MB / 4.1 GB
<driver>     0.0B / 529.9 MB


submitted with args : --executor-memory 8G  --num-executors 2 --driver-memory 1G



Mime
View raw message