spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Hubregtsen <>
Subject Re: Extra stage that executes before triggering computation with an action
Date Wed, 29 Apr 2015 22:57:53 GMT
"I'm not sure, but I wonder if because you are using the Spark REPL that it
may not be representing what a normal runtime execution would look like and
is possibly eagerly running a partial DAG once you define an operation that
would cause a shuffle.

What happens if you setup your same set of commands [a-e] in a file and use
the Spark REPL's `load` or `paste` command to load them all at once?" From

I have also packaged it in a jar file (without [e], the debug string), and
still see the extra stage before the other two that I would expect. Even
when I remove [d], the action, I still see stage 0 being executed (and do
not see stage 1 and 2). 

Again a shortened log of the Stage 0:
INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at
sortByKey, which has no missing parents
INFO DAGScheduler: ResultStage 0 (sortByKey) finished in 0.192 s

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message