beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <antonym...@yahoo.com>
Subject Re: appending beam pipeline to spark job
Date Wed, 10 May 2017 09:08:32 GMT
very useful, thanks!
btw. to avoid calling the Create.of(rdd.collect()) - is there by any chance way to get a pcollection
directly from rdd?
thx,antony. 

    On Wednesday, 10 May 2017, 10:37, Jean-Baptiste Onofré <jb@nanthrax.net> wrote:
 

 Hi Antony,

yes, it's possible to "inject"/reuse an existing Spark context via the pipeline 
options. From the SparkPipelineOptions:

  @Description("If the spark runner will be initialized with a provided Spark 
Context. "
      + "The Spark Context should be provided with SparkContextOptions.")
  @Default.Boolean(false)
  boolean getUsesProvidedSparkContext();
  void setUsesProvidedSparkContext(boolean value);

Regards
JB

On 05/10/2017 10:16 AM, Antony Mayi wrote:
> I've got a (dirty) usecase where I have existing spark batch job which produces
> an output that I would like to feed into my beam pipeline (assuming running on
> SparkRunner). I was trying to run it as one job (the output is reduced so not a
> big data hence ok to do something like Create.of(rdd.collect())) but that's
> failing because of the two separate spark contexts.
>
> Is it possible to build the beam pipeline on existing spark context?
>
> thx,
> Antony.

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


   
Mime
View raw message