airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Bockman <ch...@fathomhealth.co>
Subject how to have good DAG+Kubernetes behavior on airflow crash/recovery?
Date Sun, 17 Dec 2017 18:45:39 GMT
Hi all,

We run DAGs, and sometimes Airflow crashes (for whatever reason--maybe
something as simple as the underlying infrastructure going down).

Currently, we run everything on Kubernetes (including Airflow), so the
Airflow pods crashes generally will be detected, and then they will restart.

However, if we have, e.g., a DAG that is running task X when it crashes,
when Airflow comes back up, it apparently sees task X didn't complete, so
it restarts the task (which, in this case, means it spins up an entirely
new instance/pod).  Thus, both run "X_1" and "X_2" are fired off
simultaneously.

Is there any (out of the box) way to better connect up state between tasks
and Airflow to prevent this?

(For additional context, we currently execute Kubernetes jobs via a custom
operator that basically layers on top of BashOperator...perhaps the new
Kubernetes operator will help address this?)

Thank you in advance for any thoughts,

Chris

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message