airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Elamin <hussam.ela...@gmail.com>
Subject Re: Airflow Testing Library
Date Tue, 09 May 2017 19:34:54 GMT
Thanks Gerard and Laura, I have created an email thread as agreed in the
call so lets take the discussion there. If anyone else is interested in
helping us build this library please do get in touch!

On Tue, May 9, 2017 at 5:40 PM, Laura Lorenz <llorenz@industrydive.com>
wrote:

> Good points @Gerard. I think the distinctions you make between different
> testing considerations could help us focus our efforts. Here's my 2 cents
> in the buckets you describe; I'm wondering if any of these use cases align
> with anyone else and can help narrow our scope, and if I understood you
> right @Gerard:
>
> Regarding platform code: For our own platform code (ie custom Operators and
> Hooks), we have our CI platform running unittests on their construction
> and, in the case of hooks, integration tests on connectivity. The latter
> involves us setting up test integration services (i.e. a test MySQL
> process) which we start up as docker containers and we flip our airflow's
> configuration to point at them during testing using environment variables.
> It seems from a browse on airflow's testing that operators and hooks are
> mostly unittested, with the integrations mocked or skipped (ie
> https://github.com/apache/incubator-airflow/blob/master/
> tests/contrib/hooks/test_jira_hook.py#L40-L41
> or
> https://github.com/apache/incubator-airflow/blob/master/
> tests/contrib/hooks/test_sqoop_hook.py#L123-L125).
> If the hook is using some other, well tested library to actually establish
> the connection, the case can probably be made here that the custom operator
> and hook authors don't need integration tests, so since the normal unittest
> library is enough to handle these that might not need to be in scope for a
> new testing library to describe.
>
> Regarding data manipulation functions of the business code:
> For us, we run tests on each operator in each DAG on CI, seeded with test
> input data, asserted against known output data, all of which we have
> compiled over time to represent different edge cases we expect or have
> seen. So this is a test at the level of the operator as described in a
> given DAG. Because we only describe edge cases we have seen or can predict,
> its a very reactive way to handle testing at this level.
>
> If I understand your idea right, another way to test (or at least, surface
> errors) at this level is, given you have a DAG that is resilient against
> arbitrary data failures, your DAG should include a validation task/report
> at its end or a test suite should run daily against the production error
> log for that DAG that surfaces errors your business code encountered on
> production data. I think this is really interesting and reminds me of an
> airflow video I saw once (can't remember who gave the talk) on a DAG whose
> last task self-reported error counts and rows lost. If implemented as a
> test suite you would run against production this might be a direction we
> would want a testing library to go into.
>
> Regarding the workflow correctness of the business code:
> What we set out to do on our side was a hybrid version of your item 1 and 2
> which we call "end-to-end tests": to call a whole DAG against 'real'
> existing systems (though really they are test docker containers of the
> processes we need (MySQL and Neo4J specifically) that we use environment
> variables to switch our airflow to use when instantiating hooks etc),
> seeded with test input files for services that are hard to set up (i.e.
> third party APIs we ingest data from). Since the whole DAG is seeded with
> known input data, this gives us a way to compare the last output of a DAG
> to a known file, so that if any workflow changes OR business logic in the
> middle affected the final output, we would know as part of our test suite
> instead of when production breaks. In other words, a way to test a
> regression of the whole DAG. So this is the framework we were thinking
> needed to be created, and is a direction we could go with a testing library
> as well.
>
> This doesn't get to your point of determining what workflow was used, which
> is interesting, just not a use case we have encountered yet (we only have
> deterministic DAGs). In my mind in this case we would want a testing suite
> to be able to more or less turn some DAGs "on" against seeded input data
> and mocked or test integration services, let a scheduler go at it, and then
> check the metadata database for what workflow happened (and, if we had test
> integration services, maybe also check the output against the known output
> for the seeded input). I can definitely see your suggestion of developing
> instrumentation to inspect a followed workflow as a useful addition a
> testing library could include.
>
> To some degree our end-to-end DAG tests overlaps in our workflow with your
> point 3 (UAT environment), but we've found that more useful to test if
> "wild data" causes uncaught exceptions or any integration errors with
> difficult-to-mock third party services, not DAG level logic regressions,
> since the input data is unknown and thus we can't compare to a known output
> in this case, depending instead on a fallible human QA or just accepting
> that the DAG running with no exceptions as passing UAT.
>
> Laura
>
> On Tue, May 9, 2017 at 2:15 AM, Gerard Toonstra <gtoonstra@gmail.com>
> wrote:
>
> > Very interesting video. I was unable to take part. I watched only part of
> > it for now.
> > Let us know where the discussion is being moved to.
> >
> > The confluence does indeed seem to be the place to put final conclusions
> > and thoughts.
> >
> > For airflow, I like to make a distinction between "platform" and
> "business"
> > code. The platform code are
> > the hooks and operators and provide the capabilities of what your ETL
> > system can do. You'll test this
> > code with a lot of thoroughness, such that each component behaves how
> you'd
> > expect, judging from
> > the constructor interface. Any abstractions in there (like copying files
> to
> > GCS) should be kept as hidden
> > as possible (retries, etc).
> >
> > The "business" code is what runs on a daily basis. This can be divided in
> > another two concerns
> > for testing:
> >
> > 1 The workflow, the code between the data manipulation functions that
> > decides which operators get called
> > 2 The data manipulation function.
> >
> >
> > I think it's good practice to run tests on "2" on a daily basis and not
> > just once on CI. The reason is that there
> > are too many unforeseen circumstances where data can get into a bad
> state.
> > So such tests shouldn't run
> > once on a highly controlled environment like CI, but run daily in a less
> > predictable environment like production,
> > where all kind of weird things can happen, but you'll be able to catch
> with
> > proper checks in place. Even if the checks
> > are too rigorous, you can skip them and improve on them, so that it fits
> > what goes on in your environment
> > to your best ability.
> >
> >
> > Which mostly leaves testing workflow correctness and platform code. What
> I
> > had intended to do was;
> >
> > 1. Test the platform code against real existing systems (or maybe docker
> > containers), to test their behavior
> >     in success and failure conditions.
> > 2. Create workflow scripts for testing the workflow; this probably
> requires
> > some specific changes in hooks,
> >    which wouldn't call out to other systems, but would just pick up small
> > files you prepare from a testing repo
> >    and pass them around. The test script could also simulate
> > unavailability, etc.
> >    This relieves you of a huge responsibility of setting up systems,
> docker
> > containers and load that with data.
> >     Airflow sets up pretty quickly as a docker container and you can also
> > start up a sample database with that.
> >     Afterwards, from a test script, you can check which workflow was
> > followed by inspecting the database,
> >    so develop some instrumentation for that.
> > 3. Test the data manipulation in a UAT environment, mirrorring the runs
> in
> > production to some extent.
> >     That would be a place to verify if the data comes out correctly and
> > also show people what kind of
> >    monitoring is in place to double-check that.
> >
> >
> > On Tue, May 9, 2017 at 1:14 AM, Arnie Salazar <asalazar@riotgames.com>
> > wrote:
> >
> > > Scratch that. I see the whole video now.
> > >
> > > On Mon, May 8, 2017 at 3:33 PM Arnie Salazar <asalazar@riotgames.com>
> > > wrote:
> > >
> > > > Thanks Sam!
> > > >
> > > > Is there a part 2 to the video? If not, can you post the "next steps"
> > > > notes you took whenever you have a chance?
> > > >
> > > > Cheers,
> > > > Arnie
> > > >
> > > > On Mon, May 8, 2017 at 3:08 PM Sam Elamin <hussam.elamin@gmail.com>
> > > wrote:
> > > >
> > > >> Hi Folks
> > > >>
> > > >> For those of you who missed it, you can catch the discussion from
> the
> > > link
> > > >> on this tweet <https://twitter.com/samelamin/status/
> > 861703888298225670>
> > > >>
> > > >> Please do share and feel free to get involved as the more feedback
> we
> > > get
> > > >> the better the library we create is :)
> > > >>
> > > >> Regards
> > > >> Sam
> > > >>
> > > >> On Mon, May 8, 2017 at 9:43 PM, Sam Elamin <hussam.elamin@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >> > Bit late notice but the call is happening today at 9 15 utc so
in
> > > about
> > > >> >  30 mins or so
> > > >> >
> > > >> > It will be recorded but if anyone would like to join in on the
> > > >> discussion
> > > >> > the hangout link is https://hangouts.google.com/hangouts/_/
> > > >> > mbkr6xassnahjjonpuvrirxbnae
> > > >> >
> > > >> > Regards
> > > >> > Sam
> > > >> >
> > > >> > On Fri, 5 May 2017 at 21:35, Ali Uz <aliuz1@gmail.com>
wrote:
> > > >> >
> > > >> >> I am also very interested in seeing how this turns out. Even
> though
> > > we
> > > >> >> don't have a testing framework in-place on the project I
am
> working
> > > >> on, I
> > > >> >> would very much like to contribute to some general framework
for
> > > >> testing
> > > >> >> DAGs.
> > > >> >>
> > > >> >> As of now we are just implementing dummy tasks that test
our
> actual
> > > >> tasks
> > > >> >> and verify if the given input produces the expected output.
> Nothing
> > > >> crazy
> > > >> >> and certainly not flexible in the long run.
> > > >> >>
> > > >> >>
> > > >> >> On Fri, 5 May 2017 at 22:59, Sam Elamin <hussam.elamin@gmail.com
> >
> > > >> wrote:
> > > >> >>
> > > >> >> > Haha yes Scott you are in!
> > > >> >> > On Fri, 5 May 2017 at 20:07, Scott Halgrim <
> > > scott.halgrim@zapier.com
> > > >> >
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > Sounds A+ to me. By “both of you” did you include
me? My
> first
> > > >> >> response
> > > >> >> > > was just to your email address.
> > > >> >> > >
> > > >> >> > > On May 5, 2017, 11:58 AM -0700, Sam Elamin <
> > > >> hussam.elamin@gmail.com>,
> > > >> >> > > wrote:
> > > >> >> > > > Ok sounds great folks
> > > >> >> > > >
> > > >> >> > > > Thanks for the detailed response laura! I'll
invite both of
> > you
> > > >> to
> > > >> >> the
> > > >> >> > > > group if you are happy and we can schedule
a call for next
> > > week?
> > > >> >> > > >
> > > >> >> > > > How does that sound?
> > > >> >> > > > On Fri, 5 May 2017 at 17:41, Laura Lorenz
<
> > > >> llorenz@industrydive.com
> > > >> >> >
> > > >> >> > > wrote:
> > > >> >> > > >
> > > >> >> > > > > We do! We developed our own little in-house
DAG test
> > > framework
> > > >> >> which
> > > >> >> > we
> > > >> >> > > > > could share insights on/would love to
hear what other
> folks
> > > >> are up
> > > >> >> > to.
> > > >> >> > > > > Basically we use mock a DAG's input data,
use the
> > BackfillJob
> > > >> API
> > > >> >> > > directly
> > > >> >> > > > > to call a DAG in a test, and compare
its outputs to the
> > > >> intended
> > > >> >> > result
> > > >> >> > > > > given the inputs. We use docker/docker-compose
to manage
> > > >> services,
> > > >> >> > and
> > > >> >> > > > > split our dev and test stack locally
so that the tests
> have
> > > >> their
> > > >> >> own
> > > >> >> > > > > scheduler and metadata database and so
that our CI tool
> > knows
> > > >> how
> > > >> >> to
> > > >> >> > > > > construct the test stack as well.
> > > >> >> > > > >
> > > >> >> > > > > We co-opted the BackfillJob API for our
own purposes
> here,
> > > but
> > > >> it
> > > >> >> > > seemed
> > > >> >> > > > > overly complicated and fragile to start
and interact with
> > our
> > > >> own
> > > >> >> > > > > in-test-process executor like we saw
in a few of the
> tests
> > in
> > > >> the
> > > >> >> > > Airflow
> > > >> >> > > > > test suite. So I'd be really interested
on finding a way
> to
> > > >> >> > streamline
> > > >> >> > > how
> > > >> >> > > > > to describe a test executor for both
the Airflow test
> suite
> > > and
> > > >> >> > > people's
> > > >> >> > > > > own DAG testing and make that a first
class type of API.
> > > >> >> > > > >
> > > >> >> > > > > Laura
> > > >> >> > > > >
> > > >> >> > > > > On Fri, May 5, 2017 at 11:46 AM, Sam
Elamin <
> > > >> >> hussam.elamin@gmail.com
> > > >> >> > > > > wrote:
> > > >> >> > > > >
> > > >> >> > > > > > Hi All
> > > >> >> > > > > >
> > > >> >> > > > > > A few people in the Spark community
are interested in
> > > >> writing a
> > > >> >> > > testing
> > > >> >> > > > > > library for Airflow. We would love
anyone who uses
> > Airflow
> > > >> >> heavily
> > > >> >> > in
> > > >> >> > > > > > production to be involved
> > > >> >> > > > > >
> > > >> >> > > > > > At the moment (AFAIK) testing your
DAGs is a bit of a
> > pain,
> > > >> >> > > especially if
> > > >> >> > > > > > you want to run them in a CI server
> > > >> >> > > > > >
> > > >> >> > > > > > Is anyone interested in being involved
in the
> discussion?
> > > >> >> > > > > >
> > > >> >> > > > > > Kind Regards
> > > >> >> > > > > > Sam
> > > >> >> > > > > >
> > > >> >> > > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message