airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Imberman <daniel.imber...@gmail.com>
Subject Re: Pods not being TERMinated properly for K8s airflow
Date Tue, 12 Mar 2019 03:41:35 GMT
Oh yikes. Thanks for catching this @Jarek! @Greg Neiheisel
<greg@astronomer.io>  were there a lot of caveats to deploying from tini or
would this be a fairly straightforward fix?

On Sun, Mar 10, 2019 at 5:13 PM Greg Neiheisel <greg@astronomer.io> wrote:

> I can confirm that wrapping airflow with tini/dumb-init works pretty well.
> We've been relying on it at Astronomer for the past year. We run
> exclusively on k8s and are restarting airflow pods very frequently. Here's
> our production 1.10.2 image using tini on alpine, for example
>
> https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile#L102
> .
>
> On Sun, Mar 10, 2019 at 3:49 PM Jarek Potiuk <Jarek.Potiuk@polidea.com>
> wrote:
>
> > NOTE! I changed the subject to not pollute the AIP-12 thread
> >
> > Fokko,
> >
> > I think I know why TERM signals do not work in the current POD
> > implementation. I already experienced that several times  - dockerized
> app
> > not receiving TERM signal. The reason was always the same. It is not a
> bug
> > actually - it is expected behaviour in case your ENTRYPOINT is in the
> SHELL
> > form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT
> first
> > argument in [ "shell script", "arg" ] form.
> >
> > In those cases "shell" process becomes the "0" init process.
> Unfortunately
> > shell process is not prepared to do all the stuff that proper init
> process
> > should be doing:
> >
> > * Inherit orphaned child processes
> > * reap them
> > * Handle and propagate signals properly
> > * Wait until all subprocesses are terminated before terminating itself
> >
> > What happens is that the TERM signal just kills the init "shell" process
> > but, then the signal does not reach any of its children and the container
> > continues to run. It's well known problem in docker world and there are a
> > number of solutions (including exec-ing from shell or - better - using a
> > dumb-init/tini in your ENTRYPOINT- very tiny "proper" init
> implementations
> > that do what they should do.
> >
> > You can read more for example here:
> > https://hynek.me/articles/docker-signals
> > .
> >
> > J.
> >
> > On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fokko@driesprong.frl>
> > wrote:
> >
> > Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow
> more
> > > k8s native. You could package your dags within your container, and do a
> > > rolling update. Add the DAGs as the last layer, and then point the DAGs
> > > folder to the same location. The hard part here is that you need to
> > > gracefuly restart the workers. Currently AFAIK the signals given to the
> > pod
> > > aren't respected. So when the scheduler/webserver/worker receives a
> > > SIGTERM, it should stop the jobs nicely and then exit the container,
> > before
> > > k8s kills the container using a SIGKILL.  This will be challenging with
> > the
> > > workers, which they are potentially long-running. Maybe stop kicking
> off
> > > new jobs, and let the old ones finish, will be good enough, but then we
> > > need to increase the standard kill timeout substantially. Having this
> > would
> > > also enable the autoscaling of the workers
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > E: jarek.potiuk@polidea.com
> >
>
>
> --
> *Greg Neiheisel* / CTO Astronomer.io
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message