airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Running on master
Date Wed, 15 Jun 2016 16:16:47 GMT
Hey Lance,

Thanks for the suggestions. A variation of your idea, which was suggested
at the meetup yesterday was to snapshot the DB after a "successful" run of
the integration tests. Then, on future runs, we can snapshot the DB again,
and check that the past snapshot "matches" the current snapshot.

This is also a variation of one of my original points:

   - Use a script outside of Airflow to do the checking. Something that
   would snapshot the current state, so that you could diff the state of the
   DB over time, to get around (1) above.

There are still some drawbacks to the snapshot "source of truth" idea,
though. We'd have to update it every time we add a new DAG, and would
probably have to write custom logic for the validator to make sure that
things are checked properly. Still, this idea seems promising to me. It's a
little different, since it's a discrete integration test with an end to it
but we could make the test run for a long period of time (hours), and run
it daily on master.

Thoughts anyon?

Cheers,
Chris

On Fri, Jun 10, 2016 at 6:33 PM, Lance Norskog <lance.norskog@gmail.com>
wrote:

> Include all variations in the DAG name, like "test1_mysql_daily" for
> something that tests the mysql operator daily.
> Include stage and sequence numbers inside all task ids.
>
> Now you can read the database tables for patterns. Search for all task ids,
> sort them, and check that finish times are monotonic?
> The database is your friend here. It is supposed to store the current state
> of the installation, and the current state should always follow a set of
> rules.
> In fact, I would add triggers for all transactions and create a separate
> time series database of the complete change set. At every discrete time T,
> all of the rules should be correct. (I have no experience doing this class
> of test!)
>
> Lance
>
> On Fri, Jun 10, 2016 at 1:55 PM, Chris Riccomini <criccomini@apache.org>
> wrote:
>
> > Hey all,
> >
> > I want to run Airflow on master in a test environment. My thought was
> > simply to:
> >
> > 1. Generate some test DAGs that do various things (timeouts, SLAs, pools,
> > start/end date, etc)
> > 2. Auto-install master ever day, and restart Airflow to run the latest
> > code.
> > 3. Validate that things are working properly.
> >
> > First, does this sound useful? If so, does my plan of attack sound like
> the
> > right one?
> >
> > (3) is what I'm running into trouble with right now. I'm really
> struggling
> > to figure out the best way to monitor for when things go wrong. Some of
> the
> > things that I want to monitor are:
> >
> >    - All DAGs are executed according to their CRON schedule (if set)
> >    - Timeouts are timing out as expected
> >    - All parallelism limits are honored
> >    - SLA misses are being logged appropriately
> >    - Validate priority is honored
> >    - There are no DAG import errors
> >    - The start_date/end_date are honored properly for DAGs
> >
> > What I tried to do this morning was write an operator for each one of
> > these, and have a DAG that would do the validation. Some problems with
> this
> > approach:
> >
> >    1. It doesn't work so well for DAGs that change over time (e.g. a new
> >    task was added, the schedule_interval was changed). As soon as the DAG
> >    changes, the prior executions appear to be misbehaving. If you change
> a
> >    schedule_interval from daily to hourly, the past looks like it's
> missed
> > a
> >    bunch of executions.
> >    2. Some of these are fairly annoying to test. For example, validating
> >    that parallelism is honored means looking at every start_date/end_date
> > of
> >    every task/DAG, and making sure they weren't overlapping in a way that
> >    exceeded one of the N parallelism knobs that Airflow has.
> >
> > Other ways that I considered testing this were to:
> >
> >    - Use a cluster policy that would attach a checker task at the
> beginning
> >    of a DAG somehow.
> >    - Use a script outside of Airflow to do the checking. Something that
> >    would snapshot the current state, so that you could diff the state of
> > the
> >    DB over time, to get around (1) above.
> >    - Use StatsD to monitor for errors. This was problematic because
> things
> >    aren't well instrumented, and a large number of the things that i want
> > to
> >    check for are more active monitoring things (wake up and check), not
> >    metric-based things.
> >
> > It kind of feels like I'm thinking about this a bit wrong, so I'm looking
> > for thoughtful suggestions. The end goal is that I want to run synthetic
> > DAGs on master, and know when things broke.
> >
> > Cheers,
> > Chris
> >
>
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message