airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-160) Parse DAG files through child processes
Date Sun, 31 Jul 2016 19:50:21 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401332#comment-15401332
] 

ASF subversion and git services commented on AIRFLOW-160:
---------------------------------------------------------

Commit fdb7e949140b735b8554ae5b22ad752e86f6ebaf in incubator-airflow's branch refs/heads/master
from [~pauly]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=fdb7e94 ]

[AIRFLOW-160] Parse DAG files through child processes

Instead of parsing the DAG definition files in the same process as the
scheduler, this change parses the files in a child process. This helps
to isolate the scheduler from bad user code.

Closes #1636 from plypaul/plypaul_schedule_by_file_rebase_master


> Parse DAG files through child processes
> ---------------------------------------
>
>                 Key: AIRFLOW-160
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-160
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>
> Currently, the Airflow scheduler parses all user DAG files in the same process as the
scheduler itself. We've seen issues in production where bad DAG files cause scheduler to fail.
A simple example is if the user script calls `sys.exit(1)`, the scheduler will exit as well.
We've also seen an unusual case where modules loaded by the user DAG affect operation of the
scheduler. For better uptime, the scheduler should be resistant to these problematic user
DAGs.
> The proposed solution is to parse and schedule user DAGs through child processes. This
way, the main scheduler process is more isolated from bad DAGs. There's a side benefit as
well - since parsing is distributed among multiple processes, it's possible to parse the DAG
files more frequently, reducing the latency between when a DAG is modified and when the changes
are picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks are sent
to the executor. This means that the frequency of task scheduling is limited by the slowest
DAG to schedule. The changes needed for scheduling DAGs through child processes will also
make it easy to decouple this process and allow tasks to be scheduled and sent to the executor
in a more independent fashion. This way, overall scheduling won't be held back by a slow DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message