spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Fiorito <silvio.fior...@granturing.com>
Subject Re: why one of Stage is into Skipped section instead of Completed
Date Sat, 26 Dec 2015 18:14:13 GMT
Skipped stages result from existing shuffle output of a stage when re-running a transformation.
The executors will have the output of the stage in their local dirs and Spark recognizes that,
so rather than re-computing, it will start from the following stage. So, this is a good thing
in that you’re not re-computing a stage. In your case, it looks like there’s already the
output of the userreqs RDD (reduceByKey) so it doesn’t re-compute it.

From: Prem Spark <sparksure542@gmail.com<mailto:sparksure542@gmail.com>>
Date: Friday, December 25, 2015 at 11:41 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: why one of Stage is into Skipped section instead of Completed


Whats does the below Skipped Stage means. can anyone help in clarifying?
I was expecting 3 stages to get Succeeded but only 2 of them getting completed while one is
skipped.
    Status: SUCCEEDED
    Completed Stages: 2
    Skipped Stages: 1

Scala REPL Code Used:

accounts is a basic RDD contains weblog text data.

var accountsByID = accounts.

map(line => line.split(',')).

map(values => (values(0),values(4)+','+values(3)));

var userreqs = sc.

textFile("/loudacre/weblogs/*6").

map(line => line.split(' ')).

map(words => (words(2),1)).

reduceByKey((v1,v2) => v1 + v2);

var accounthits =

accountsByID.join(userreqs).map(pair => pair._2)

accounthits.

saveAsTextFile("/loudacre/userreqs")

scala> accounthits.toDebugString
res15: String =
(32) MapPartitionsRDD[24] at map at <console>:28 []
 |   MapPartitionsRDD[23] at join at <console>:28 []
 |   MapPartitionsRDD[22] at join at <console>:28 []
 |   CoGroupedRDD[21] at join at <console>:28 []
 +-(15) MapPartitionsRDD[15] at map at <console>:25 []
 |  |   MapPartitionsRDD[14] at map at <console>:24 []
 |  |   /loudacre/accounts/* MapPartitionsRDD[13] at textFile at <console>:21 []
 |  |   /loudacre/accounts/* HadoopRDD[12] at textFile at <console>:21 []
 |   ShuffledRDD[20] at reduceByKey at <console>:25 []
 +-(32) MapPartitionsRDD[19] at map at <console>:24 []
    |   MapPartitionsRDD[18] at map at <console>:23 []
    |   /loudacre/weblogs/*6 MapPartitionsRDD[17] at textFile at <console>:22 []
    |   /loudacre/weblogs/*6 HadoopRDD[16] at textFile at <con...






Mime
View raw message