airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From harish singh <harish.sing...@gmail.com>
Subject Re: monitoring the existing pipeline
Date Tue, 28 Jun 2016 16:02:57 GMT
Yup. I have added retries to that tasks (depending on short running or long
running jobs)
I have a DAG level dagrun_timeout of 1 hour.  But for some reason this
doesnt work (If some task fails all its retires, the dag remains in running
state forever and then no further jobs are scheduled after number of active
dags reach the parallelism-param set in airflow.cfg). I did start a
separate thread for this.


The goal is to have some automate pipeline monitor is run behind the main
pipeline (may be once a day)
and '*clear*' the state of failed task (OR ideally put the jobs in a state
where the scheduler picks it up) so that the task can put them back to
running.  This would mean one does not have to manually re-run as failed
task (once the bug gets fixed)
I am doing this by using scripts that uses  airflow cli: "airflow
task_state"/ 'airflow clear'.


Thanks






On Mon, Jun 27, 2016 at 8:25 AM, Lance Norskog <lance.norskog@gmail.com>
wrote:

> You can add add retries to the task, including a timeout and a counter. So,
> 5 retries with an hour in between might be a strategy.
>
>
> On Sat, Jun 25, 2016 at 7:24 PM, harish singh <harish.singh22@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > I am trying to build a pipeline/script to monitor our Data-processing
> > pipeline :)
> >
> > Basically, I am trying to do these things:
> > 1. Go back in time n hours. and Get status of a TASK for last n hours
> > (assuming hourly jobs)
> >    I can use the airflow CLI command:   "*task_state" * to achieve this.
> >
> > So this tells me where the job has failed/succeeded/running etc.
> >
> >
> > 2. Once I figure out, if some execution of a TASK has state "failed", I
> > want to change the state to "running" again. so that scheduler picks it
> up
> > and runs it??
> > *Is there a way to do this? *
> >
> > I think one way to do this is:
> >   if a Task is in failed state ---> user "airflow clear" and CLEAR the
> > state. so that scheduler picks it up.
> > But I am not sure how much I can depend on this approach?  Will this
> always
> > work?
> >
> >
> > I just want to think out loud and know if there is a better way to doing
> > this that I am not looking at? Either through code? a new monitoring
> > pipeline?
> >
> >
> > Thanks,
> > Harish
> >
>
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message