airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Lam <ke...@fathomhealth.co>
Subject Re: Making Airflow Fault-Tolerant when running Airflow on Kubernetes
Date Wed, 12 Sep 2018 21:35:32 GMT
Hi Daniel,

Thanks for the reply!

No we haven't looked too deeply into it. Can you elaborate a bit on how
that works? With the KubernetesExecutor, if a DAG is in flight and part of
airflow go down, it will be able to recover? How do airflow workers
reconnect to Pods that were in flight?

On Wed, Sep 12, 2018 at 4:59 PM Daniel Imberman <daniel.imberman@gmail.com>
wrote:

> Hi Kevin,
>
> Have you looked into the KubernetesExecutor? We achieve fault tolerance
> using the kubernetes resourceVersion to ensure that all state is
> reproducible.
>
> On Wed, Sep 12, 2018 at 1:08 PM Kevin Lam <kevin@fathomhealth.co> wrote:
>
> > Hi all,
> >
> > We currently run Airflow as a Deployment in a kubernetes cluster. We also
> > use a variant of KubernetesOperator to run our DAGs.
> >
> > We are investigating how to best make Airflow fault-tolerant, in part,
> due
> > to investigating the use of preemptible vms [1]. *Has there been much
> > discussion about about how to deploy Airflow in a fault-tolerant way? Are
> > there any best practices? Ideally we'd like our kubernetes-hosted Airflow
> > to support rolling updates for Docker image updates and also recover from
> > components (worker, scheduler, web) going down temporarily, including
> when
> > DAGs are in flight. *
> >
> > Any advice, ideas and/or feedback appreciated!
> >
> > [1]
> https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message