crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Question about Spark Job/Stage names
Date Tue, 29 Sep 2015 14:12:01 GMT
Hey Nithin,

I checked around about this-- apparently the stage name is hard-coded to be
the call-site of the code block that triggered the stage:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Stage.scala

Right now, we pass the names for DoFns to the RDDs we create via
RDD.setName, but obviously that doesn't play into the stage name control.

J

On Mon, Sep 28, 2015 at 5:46 PM, Nithin Asokan <anithin19@gmail.com> wrote:

> I'm fairly new to Spark, and would like to understand about stage/job
> names when using Crunch on Spark. When I submit my Spark application, I see
> a set of stage names like *mapToPair at PGroupedTableImpl.java:108. *I
> would like to understand if it possible by user code to update these stage
> names dynamically? Perhaps, is it possible to have DoFn names as Stage
> names?
>
> I did a little bit of digging and the closest thing I can find to modify
> stage name is using
>
> sparkContext.setCallSite(String)
>
> However, this updates all stage and job names to same text. I tried
> looking at MRPipeline's implementation to understand how JobNames are
> built, and I believe for SparkPipeline crunch does not create DAG and we
> don't create a job name.
>
> But does anyone with Spark expertise know if it's possible in Crunch to
> create job/stage names based on DoFn names?
>
> Thank you!
> Nithin
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message