airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: SparkOperator - tips and feedback?
Date Sun, 19 Mar 2017 02:24:31 GMT
A spark operator exists as of 1.8.0 (which will be released tomorrow), you might want to take
a look at that. I know that an update is coming to that operator that improves communication
with Yarn.

Bolke

> On 18 Mar 2017, at 18:43, Russell Jurney <russell.jurney@gmail.com> wrote:
> 
> Ruslan, thanks for your feedback.
> 
> You mean the spark-submit context? Or like the SparkContext and
> SparkSession? I don't think we could keep that alive, because it wouldn't
> work out with multiple calls to spark-submit. I do feel your pain, though.
> Maybe someone else can see how this might be done?
> 
> If SparkContext was able to open the spark/pyspark console, then multiple
> job submissions would be possible. I didn't have this in mind but an
> InteractiveSparkContext or SparkConsoleContext might be able to do this?
> 
> Russell Jurney @rjurney <http://twitter.com/rjurney>
> russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
> 
> On Sat, Mar 18, 2017 at 3:02 PM, Ruslan Dautkhanov <dautkhanov@gmail.com>
> wrote:
> 
>> +1 Great idea.
>> 
>> my two cents - it would be nice (as an option) if SparkOperator would be
>> able to keep context open between different calls,
>> as it takes 30+ seconds to create a new context (on our cluster). Not sure
>> how well it fits Airflow architecture.
>> 
>> 
>> 
>> --
>> Ruslan Dautkhanov
>> 
>> On Sat, Mar 18, 2017 at 3:45 PM, Russell Jurney <russell.jurney@gmail.com>
>> wrote:
>> 
>>> What do people think about creating a SparkOperator that uses
>> spark-submit
>>> to submit jobs? Would work for Scala/Java Spark and PySpark. The patterns
>>> outlined in my presentation on Airflow and PySpark
>>> <http://bit.ly/airflow_pyspark> would fit well inside an Operator, I
>>> think.
>>> BashOperator works, but why not tailor something to spark-submit?
>>> 
>>> I'm open to doing the work, but I wanted to see what people though about
>> it
>>> and get feedback about things they would like to see in SparkOperator and
>>> get any pointers people had to doing the implementation.
>>> 
>>> Russell Jurney @rjurney <http://twitter.com/rjurney>
>>> russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
>>> <http://facebook.com/jurney> datasyndrome.com
>>> 
>> 


Mime
View raw message