airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <>
Subject Cutting down on testing time - updated
Date Sat, 25 Feb 2017 18:19:05 GMT
Hi All,

(Welcome to new MacBook Pro that has a send “button” on the touch bar)

Jeremiah and I have been looking into optimising the time that is spend on tests. The reason
for this was that Travis’ runs are taking more and more time and we are being throttled
by travis. As part of that we enabled color coding of test outcomes and timing of tests. The
results kind of …surprising.

This is the top 20 of tests were we spend the most time. MySQL (remember concurrent access
enabled) - <>

tests.BackfillJobTest.test_backfill_examples: 287.9209s
tests.BackfillJobTest.test_backfill_multi_dates: 53.5198s
tests.SchedulerJobTest.test_scheduler_start_date: 36.4935s
tests.CoreTest.test_scheduler_job: 35.5852s
tests.CliTests.test_backfill: 29.7484s
tests.SchedulerJobTest.test_scheduler_multiprocessing: 26.1573s
tests.DaskExecutorTest.test_backfill_integration: 24.5456s
tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 17.3278s
tests.SubDagOperatorTests.test_subdag_deadlock: 16.1957s
tests.SensorTimeoutTest.test_timeout: 15.1000s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 13.8812s
tests.BackfillJobTest.test_cli_backfill_depends_on_past: 12.9539s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date: 12.8779s
tests.SchedulerJobTest.test_dagrun_success: 12.8177s
tests.SchedulerJobTest.test_dagrun_root_fail: 10.3953s
tests.SchedulerJobTest.test_dag_with_system_exit: 10.1132s
tests.TransferTests.test_mysql_to_hive: 8.5939s
tests.SchedulerJobTest.test_retry_still_in_executor: 8.1739s
tests.SchedulerJobTest.test_dagrun_fail: 7.9855s
tests.ImpersonationTest.test_default_impersonation: 7.4993s

Yes we spend a whopping 5 minutes on executing all examples. Another interesting one is “tests.CoreTest.test_scheduler_job”.
This test just checks whether a certain directories are creating as part of logging. This
could have been covered by a real unit test just covering the functionality of the function
that creates the files - now it takes 35s. 

We discussed several strategies for reducing time apart from rewriting some of the tests (that
would be a herculean job!). What the most optimal seems is:

1. Run the scheduler tests apart from all other tests. 
2. Run “operator” integration tests in their own unit.
3. Run UI tests separate
4. Run API tests separate

This creates the following build matrix (warning ASCII art):

| 			|  Scheduler	 | 	Operators	|	UI	|	API	| 
| Python 2	| x			 |.     x			|	x	|	x	|
| Python 3	| x			 |	x			|	x	|	x	|
| Kerberos	| 			 |				|	x	|	x	|
| Ldap		|			 |				|	x	|		|
| Hive		| 			 |	x			|	x	|	x	|
| SSH		|			 |	x			|		|		|
| Postgres	| x			 |	x			|	x	|	x	|
| MySQL		| x 			 |	x			|	x	|	x	|
| SQLite		| x			 |	x			|	x	|	x	|

So from this build matrix one can deduct that Postgres, MySQL are generic services that will
be present in every build. In addition all builds will use Python 2 and Python 3. And I propose
using Python 3.4 and Python 3.5. The matrix can be expressed by environment variables. See
.travis.yml for the current build matrix.

Furthermore, I would like us to label our tests correctly, e.g. unit test or integration test.
This can be done by a comment or introducing a decorator @unittest and @integrationtest. This
is to help reviewers and maintainers to find out whether new functionality is correctly covered.
At a minimum a unit test is required for new functionality.

What is a unit test (thanks stack overflow): A unit test is a test written by the programmer
to verify that a relatively small piece of code is doing what it is intended to do. They are
narrow in scope, they should be easy to write and execute, and their effectiveness depends
on what the programmer considers to be useful. Part of being a unit test is the implication
that things outside the code under test are mocked or stubbed out. Unit tests shouldn't have
dependencies on outside systems. They test internal consistency as opposed to proving that
they play nicely with some outside system. 

An integration test is done to demonstrate that different pieces of the system work together.
Integration tests cover whole applications, and they require much more effort to put together.
They usually require resources like database instances and hardware to be allocated for them.
The integration tests do a more convincing job of demonstrating the system works (especially
to non-programmers) than a set of unit tests can, at least to the extent the integration test
environment resembles production.

Lastly, I would like us to use the “mirror the file you are testing”. Ie. tests/
tests etc. This means we should stop adding tests to and migrate away from

I will create a couple of Jiras to track this.

Any thoughts?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message