beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <>
Subject Re: appending beam pipeline to spark job
Date Wed, 10 May 2017 09:08:32 GMT
very useful, thanks!
btw. to avoid calling the Create.of(rdd.collect()) - is there by any chance way to get a pcollection
directly from rdd?

    On Wednesday, 10 May 2017, 10:37, Jean-Baptiste Onofré <> wrote:

 Hi Antony,

yes, it's possible to "inject"/reuse an existing Spark context via the pipeline 
options. From the SparkPipelineOptions:

  @Description("If the spark runner will be initialized with a provided Spark 
Context. "
      + "The Spark Context should be provided with SparkContextOptions.")
  boolean getUsesProvidedSparkContext();
  void setUsesProvidedSparkContext(boolean value);


On 05/10/2017 10:16 AM, Antony Mayi wrote:
> I've got a (dirty) usecase where I have existing spark batch job which produces
> an output that I would like to feed into my beam pipeline (assuming running on
> SparkRunner). I was trying to run it as one job (the output is reduced so not a
> big data hence ok to do something like Create.of(rdd.collect())) but that's
> failing because of the two separate spark contexts.
> Is it possible to build the beam pipeline on existing spark context?
> thx,
> Antony.

Jean-Baptiste Onofré
Talend -

View raw message