airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eamon Keane <eamon.kea...@gmail.com>
Subject Re: [Discuss] Airflow Kubernetes worker configuration should be parsed from YAML
Date Wed, 06 Mar 2019 15:40:08 GMT
Thanks for starting the discussion David.

Any templating should apply for both kubernetes airflow workers and
kubernetes pod operators. I estimate there are currently around 20 objects
in the pod spec missing (kubectl explain pod.spec --recursive).

The main challenge would probably be getting airflow up to speed with the
full current spec than the changing of that in the future as the kitchen
sink already appears to be in there.

For comparison, Jenkins uses a combination of four sources for its pod
templates:

* Built-in java objects covering most but not all of pod spec
* Yaml strings
* Yaml files
* Inheritance from base templates

https://github.com/jenkinsci/kubernetes-plugin/blob/master/README.md

Something along the lines of the CRDs you mention James might be Tekton
(aka knative-build). It is early stages of Tekton but Jenkins-x for example
is switching to that for its pipelines. I haven't examined it in enough
detail to know if it would fit neatly with airflow.

https://github.com/knative/build-pipeline/releases/tag/v0.1.0

On Wed, Mar 6, 2019 at 3:18 PM James Meickle
<jmeickle@quantopian.com.invalid> wrote:

> I'm in favor of having a YAML-based option for Kubernetes. We've had to
> internally subclass the Kubernetes operator because it really isn't doing
> what we need out of the box; such as intercepting the object it creates
> right before it sends it so that we can patch in missing features. I think
> it would make sense to make this a sibling class to the existing operator,
> since it can use the same watching/submitting logic, but just accept YAML
> instead. Using existing Airflow templating systems here would make sense
> too, of course.
>
> However, what I'd really like to see is a Helm operator!
>
> Airflow tasks often require temporary resources. Here's an example: we run
> the same container in ~12 different configurations. Each of them requires
> slightly different ConfigMaps. As of right now, we have to manage the
> ConfigMaps out of band from Airflow, because Airflow has no way to maintain
> or update those ConfigMaps. This can lead to pushing the new code to
> Airflow, but forgetting to update the ConfigMaps.
>
> What would be ideal for us is to define the task _and_ its necessary
> resources in a Helm chart (either in the same repo as the DAG, or pointing
> to a semver tag). Then the operator would wait for the entire chart to
> finish successfully, including creating and tearing down resources as
> required.
>
> This would also help in scenarios where we want to run a task outside of
> Airflow. Right now, a lot of our tasks are "baked into" the DAG and can't
> be run without either going through Airflow, or manually copying config
> options from the DAG code. Declaring a task as a resource, and then just
> referencing that resource from Airflow, would allow us to also reference
> that resource in other systems in our infrastructure and ensure that it
> gets invoked in an identical way.
>
> Unfortunately Helm itself has some issues around not really having a
> concept of "one-off" tasks. So we started to build something like this
> in-house but ran into roadblocks. We looked into hacks like storing task
> definitions in a CronJob but I came to the conclusion that a TaskTemplate
> CRD would be needed to support this kind of workflow.
>
> On Wed, Mar 6, 2019 at 10:06 AM david.lum@ssense.com <david.lum@ssense.com
> >
> wrote:
>
> > Hi,
> >
> > I would like to discuss parsing YAML for the Kubernetes worker
> > configuration instead of the current process of programmatically
> generating
> > the YAML from the Pod and PodRequest Factory as is done currently.
> >
> > *Motivation:*
> >
> > Kubernetes configuration is quite complex. Instead of using the
> > configuration system that is offered natively by Kubernetes (YAML),the
> > current method involves programmatically recreating this YAML file. Fully
> > re-implementing the configuration in Airflow is taking a lot of time, and
> > at the moment many features available through YAML configuration are not
> > available in Airflow. Furthermore, as the Kubernetes API evolves, the
> > Airflow codebase will have to change with it, and Airflow will be in a
> > constant state of catching up with missing features available. This can
> all
> > be solved by simply parsing the YAML file.
> >
> > *Idea:*
> >
> > Either pass in the YAML as string or have a path to the YAML file.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message