airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From harish singh <harish.sing...@gmail.com>
Subject Re: Final task in a backfill is run over and over - caught in a weird failure loop
Date Mon, 02 May 2016 22:38:24 GMT
any further thoughts on this?

On Sat, Apr 30, 2016 at 5:08 PM, harish singh <harish.singh22@gmail.com>
wrote:

> Is it possible that the sql you're running to get customer ids is not the
> same every time? That's what I (loosely) meant by non-deterministic.
> [response]
> The sql is the same. But it is definitely possible that sometimes there is
> always a possibility of things going wrong  within the method " getCustomerIds(
> //run some sql ) ",  For example ( some kinds of  SqlException )
>
>
> -- Now if the customer is deleted or lets say airflow cannot find that
> DAG:
> even then, the pipeline should not stall ?? Its ok if the customer is not
> found and there is some issue with that ONE dag. But why will the entire
> pipeline get stuck?
>
> For now, are DAGs are pretty much static. The customer list is also going
> to remain the same. Irrespective of what pattern we choose, the pipeline
> should not stop.
>
> This is what my logs show. I monitored for about 45 minutes and no
> progress.
>
> *[2016-04-29 06:56:43,013] {jobs.py:813} INFO - [backfill progress]
> waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
> *[2016-04-29 06:56:48,010] {jobs.py:813} INFO - [backfill progress]
> waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
> *[2016-04-29 06:56:53,014] {jobs.py:813} INFO - [backfill progress]
> waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
> *[2016-04-29 06:56:58,010] {jobs.py:813} INFO - [backfill progress]
> waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
>
> On Sat, Apr 30, 2016 at 6:21 AM, Jeremiah Lowin <jlowin@apache.org> wrote:
>
>> Is it possible that the sql you're running to get customer ids is not the
>> same every time? That's what I (loosely) meant by non-deterministic.
>>
>> The error message suggests that Airflow is sending out an instruction to
>> run a DAG called "Pipeline_DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B"
>> which it expects to find in pipeline.py. However, it is not finding any
>> DAG
>> object by that name. That's why I'm wondering if your code is always
>> generating the exact same DAGs.
>>
>> For example, if customer "DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B"
>> had
>> been deleted from your database, then Airflow might send out a command to
>> run that DAG immediately before the deletion and be unable to load it the
>> next time it parses the file.
>>
>> In general, Airflow assumes that DAGs are static and unchanging (or slowly
>> changing at best). If you want to have parameterized (per-customer)
>> workflows, it might be best to create a single DAG/tasks that define your
>> general workflow (for example, "retrieve customer i information" ->
>> "process customer i information" -> "store customer i information"). That
>> way your DAG and tasks remain stable. Perhaps someone else on the list
>> could share an effective pattern along those lines.
>>
>> Now, if that isn't your situation, something strange is happening that's
>> preventing Airflow from locating the DAG.
>>
>> On Fri, Apr 29, 2016 at 6:49 PM harish singh <harish.singh22@gmail.com>
>> wrote:
>>
>> > *"However a DAG with such a complicated name isn't referenced in the
>> > examplecode (just "Pipeline" + i). My guess is that the DAG id is being
>> > generatedin a non-deterministic or time-based way, and therefore the run
>> > commandcan't find it once the generation criteria change. But hard to
>> say
>> > withoutmore detail."*
>> > [response]  I am not sure what you mean by non-deterministic way.
>> > We are dynamically creating DAGs (1 Dag per customer) . So if there are
>> 100
>> > customers with itds 1, 2,3 .... 100, there will be 100 pipelines/dags
>> with
>> > names:
>> >
>> > *Pipeline_1, Pipeline_2, Pipeline_3 ...... Pipeline_100*
>> >
>> > The customer names I get from our database table which stores this
>> > information.
>> >
>> > so the flow is:
>> >
>> > customer_id_list ->  getCustomerIds( //run some sql )
>> > for each customerId in  customer_id_list:
>> >   dag = DAG("Pipeline_"+ customerId, default_args=default_args,
>> > *schedule_interval= **datetime.timedelta(minutes=60)*)
>> >
>> >
>> > Let me know if that helps(or confuses :) ) you in understanding the
>> flow.
>> > If there is some thing wrong in what I am doing, I would love to know
>> what
>> > is it. This seems to be a serious issue (if it is) esp while running
>> > backfill.
>> >
>> >
>> >
>> > On Fri, Apr 29, 2016 at 12:19 PM, Jeremiah Lowin <jlowin@gmail.com>
>> wrote:
>> >
>> > > That error message usually means that an error took place inside
>> Airflow
>> > > before the task ran -- maybe something with setting up the task? The
>> > task's
>> > > state is NONE, meaning it never even started, but the executor is
>> > reporting
>> > > that it successfully sent the command to start the task (SUCCESS)...
>> the
>> > > culprit is some failure in between.
>> > >
>> > > The error message seems to say that the DAG itself couldn't be loaded
>> > from
>> > > the .py file:
>> > >
>> > > airflow.utils.AirflowException: DAG
>> > > [Pipeline_DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B]
>> > > could not be found in /usr/local/airflow/dags/pipeline.py
>> > >
>> > > However a DAG with such a complicated name isn't referenced in the
>> > example
>> > > code (just "Pipeline" + i). My guess is that the DAG id is being
>> > generated
>> > > in a non-deterministic or time-based way, and therefore the run
>> command
>> > > can't find it once the generation criteria change. But hard to say
>> > without
>> > > more detail.
>> > >
>> > >
>> > >
>> > > On Fri, Apr 29, 2016 at 3:11 PM Bolke de Bruin <bdbruin@gmail.com>
>> > wrote:
>> > >
>> > > > I would really like to know what the use case is for a
>> depends_on_past
>> > on
>> > > > the *task* level.  What past are you trying to depend on?
>> > > >
>> > > > What I am currently assuming from just reading the example and
>> replying
>> > > on
>> > > > my phone is that the depends_on_past prevents execution. Have t4
>> and t5
>> > > > ever run?
>> > > >
>> > > > Bolke
>> > > >
>> > > > Sent from my iPhone
>> > > >
>> > > > > On 29 apr. 2016, at 21:01, Chris Riccomini <criccomini@apache.org
>> >
>> > > > wrote:
>> > > > >
>> > > > > @Bolke/@Jeremiah, do you guys think this is related? Full thread
>> is
>> > > here:
>> > > > >
>> > >
>> https://groups.google.com/forum/?pli=1#!topic/airbnb_airflow/y7wt3I24Rmw
>> > > > >
>> > > > >> On Fri, Apr 29, 2016 at 11:57 AM, Chris Riccomini <
>> chrisr@wepay.com
>> > >
>> > > > wrote:
>> > > > >>
>> > > > >> Please subscribe to the dev@ mailing list. Sorry to make
you
>> jump
>> > > > through
>> > > > >> hoops--I know it's annoying--but it's for a good cause. ;)
>> > > > >>
>> > > > >> This looks like a bug. I'm wondering if it's related to
>> > > > >> https://issues.apache.org/jira/browse/AIRFLOW-20. Perhaps
the
>> > > backfill
>> > > > is
>> > > > >> causing a mis-alignment between the dag runs, and depends_on_past
>> > > logic
>> > > > >> isn't seeing the prior execution?
>> > > > >>
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message