[ https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998681#comment-15998681
]
Al Johri edited comment on AIRFLOW-247 at 5/7/17 11:07 PM:
-----------------------------------------------------------
I'm searching for documentation related to how Airflow works with EMR. I'm struggling to find
anything here: https://airflow.incubator.apache.org/integration.html#aws
My main question is, can Airflow create an EMR cluster and bring it back down like AWS Data
Pipeline?
Thanks!
EDIT: Found some information here:
Spark, EMR:
- (uses emr hooks, operators) https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd40eeee67_1_0
- (uses shells scripts to launch and terminate emr clusters) https://www.agari.com/automated-model-building-emr-spark-airflow/
EMR:
- https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
- https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
- https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
- https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py
Spark:
- https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
- https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py
was (Author: al.johri):
I'm searching for documentation related to how Airflow works with EMR. I'm struggling to find
anything here: https://airflow.incubator.apache.org/integration.html#aws
My main question is, can Airflow create an EMR cluster and bring it back down like AWS Data
Pipeline?
Thanks!
EDIT: Found some information here:
Spark, EMR:
https://docs.google.com/presentation/d/1NG1P86HRlX43qTVucCTOsFqIbCvYdOhq_np90VlbVRc/edit#slide=id.gd40eeee67_1_0
EMR:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/emr_hook.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_create_job_flow_operator.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_add_steps_operator.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/emr_terminate_job_flow_operator.py
Spark:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py
> EMR Hook, Operators, Sensor
> ---------------------------
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
> Issue Type: New Feature
> Reporter: Rob Froetscher
> Assignee: Rob Froetscher
> Priority: Minor
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be nice to have
an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow
> A sensor to:
> * monitor completion and status of EMR jobs
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
|