spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: ResultStage's parent stages only ShuffleMapStages?
Date Fri, 06 Nov 2015 08:36:15 GMT
Right, there're only 2 kinds of stage: ResultStage & ShuffleMapStage.
ShuffleMapStage will shuffle its data for downstream consumption, but
ResultStage don't need to do that.

I guess you may be confused these concepts with Map/Reduce.   Actually
ShuffleMapStage could be represented as either Map or Reduce as long as it
produce intermediate data for downstream consumption.




On Fri, Nov 6, 2015 at 4:15 PM, Jacek Laskowski <jacek@japila.pl> wrote:

> Hi,
>
> Just to make sure that what I see in the code and think I understand
> is indeed correct...
>
> When a job is submitted to DAGScheduler, it creates a new ResultStage
> that in turn queries for the parent stages of itself given the RDD
> (using `getParentStagesAndId` in `newResultStage`).
>
> Are a ResultStage's parent stages only ShuffleMapStages?
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message