airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Feng <>
Subject better way to schedule pyspark with SparkOperator on Airflow
Date Thu, 07 Feb 2019 07:26:56 GMT

I wonder any suggestions on how to use SparkOperator to send pyspark file
to the spark cluster. And any suggestions on how to specify the pyspark
dependency ?

We currently push user pyspark file and dependency to a S3 location and get
picked up by our Spark cluster. And we would like to explore and see if
there are suggestions on how to improve the workflow.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message