spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolae Marasoiu <>
Subject Re: Problem understanding spark word count execution
Date Thu, 01 Oct 2015 04:57:35 GMT


2- the end results are sent back to the driver; the shuffles are transmission of intermediate
results between nodes such as the -> which are all intermediate transformations.

More precisely, since flatMap and map are narrow dependencies, meaning they can usually happen
on the local node, I bet shuffle is just sending out the textFile to a few nodes to distribute
the partitions.

From: Kartik Mathur <>
Sent: Thursday, October 1, 2015 12:42 AM
To: user
Subject: Problem understanding spark word count execution

Hi All,

I tried running spark word count and I have couple of questions -

I am analyzing stage 0 , i.e
 sc.textFile -> flatMap -> Map (Word count example)

1) In the Stage logs under Application UI details for every task I am seeing Shuffle write
as 2.7 KB, question - how can I know where all did this task write ? like how many bytes to
which executer ?

2) In the executer's log when I look for same task it says 2000 bytes of result is sent to
driver , my question is , if the results were directly sent to driver what is this shuffle
write ?


View raw message