beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Etienne Chauchot <echauc...@apache.org>
Subject Re: ValidatesRunner test cleanup
Date Fri, 04 May 2018 09:34:51 GMT
Scott, thanks for that !
I only quickly looked at the ValidatesRunner tests that I wrote (you modified none) and the
ones that impact my ongoing
work (metrics). 
I think some tests in MetricsTest still need to be ValidatesRunner tests. See my comment in
the review.
Etienne
> Note: if you don't care about Java runner tests, you can stop reading now.
> 
> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1] and converted
many to @NeedsRunner in order to
> reduce post-commit runtime.
> 
> This is work that was long overdue and finally got my attention due to the Gradle migration.
As context,
> @ValidatesRunner [2] tests construct a TestPipeline and exercise runner behavior through
SDK constructs. The tests are
> written runner-agnostic so that they can be run on and validate all supported runners.
> 
> The framework for these tests is great and writing them is super-easy. But as a result,
we have way too many of them--
> over 250. These tests run against all runners, and even when parallelized we see Dataflow
post-commit times exceeding
> 3-5 hours [3].
> 
> When reading through these tests, we found many of them don't actually exercise runner-specific
behavior, and were
> simply using the TestPipeline framework to validate SDK components. This is a valid pattern,
but tests should be
> annotated with @NeedsRunner instead. With this annotation, the tests will run on only
a single runner, currently
> DirectRunner.
> 
> So, PR/5218 looks at all existing @ValidatesRunner tests and conservatively converts
tests which don't need to
> validate all runners into @NeedsRunner. I've also sharded out some very large test classes
into scenario-based sub-
> classes. This is because Gradle parallelizes tests at the class-level, and we found a
couple very large test classes
> (ParDoTest) became stragglers for the entire execution. Hopefully Gradle will soon implement
dynamic splitting :)
> 
> So, the action I'd like to request from others:
> 1) If you are an author of @ValidatesRunner tests, feel free to look over the PR and
let me know if I missed anything.
> Kenn Knowles is also helping out here.
> 2) If you find yourself writing new @ValidatesRunner tests, please consider whether your
test is validating runner-
> provided behavior. If not, use @NeedsRunner instead.
> 
> 
> [1] https://github.com/apache/beam/pull/5218
> [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunne
> r.java 
> [3] https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend 
Mime
View raw message