From dev-return-9527-archive-asf-public=cust-asf.ponee.io@beam.apache.org  Fri May  4 11:34:55 2018
Return-Path: <dev-return-9527-archive-asf-public=cust-asf.ponee.io@beam.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 116FC180634
	for <archive-asf-public@cust-asf.ponee.io>; Fri,  4 May 2018 11:34:54 +0200 (CEST)
Received: (qmail 64191 invoked by uid 500); 4 May 2018 09:34:53 -0000
Mailing-List: contact dev-help@beam.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@beam.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@beam.apache.org>
List-Post: <mailto:dev@beam.apache.org>
List-Id: <dev.beam.apache.org>
Reply-To: dev@beam.apache.org
Delivered-To: mailing list dev@beam.apache.org
Received: (qmail 64181 invoked by uid 99); 4 May 2018 09:34:53 -0000
Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 May 2018 09:34:53 +0000
Received: from [192.168.39.165] (154.92-14-84.ripe.coltfrance.com [84.14.92.154])
	by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 9E7C676C
	for <dev@beam.apache.org>; Fri,  4 May 2018 09:34:52 +0000 (UTC)
Message-ID: <1525426491.3935.2.camel@apache.org>
Subject: Re: ValidatesRunner test cleanup
From: Etienne Chauchot <echauchot@apache.org>
To: dev@beam.apache.org
Date: Fri, 04 May 2018 11:34:51 +0200
In-Reply-To: <CANTXtWof-3vw3vtq_gwOxSK1UiCDdV9obGFyNtTL3gUm=Ph50Q@mail.gmail.com>
References: 
	<CANTXtWof-3vw3vtq_gwOxSK1UiCDdV9obGFyNtTL3gUm=Ph50Q@mail.gmail.com>
Content-Type: multipart/alternative; boundary="=-Vb9n+PzI+WJmK13+d6h8"
X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 
Mime-Version: 1.0


--=-Vb9n+PzI+WJmK13+d6h8
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit

Scott, thanks for that !
I only quickly looked at the ValidatesRunner tests that I wrote (you modified none) and the ones that impact my ongoing
work (metrics). 
I think some tests in MetricsTest still need to be ValidatesRunner tests. See my comment in the review.
Etienne
> Note: if you don't care about Java runner tests, you can stop reading now.
> 
> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1] and converted many to @NeedsRunner in order to
> reduce post-commit runtime.
> 
> This is work that was long overdue and finally got my attention due to the Gradle migration. As context,
> @ValidatesRunner [2] tests construct a TestPipeline and exercise runner behavior through SDK constructs. The tests are
> written runner-agnostic so that they can be run on and validate all supported runners.
> 
> The framework for these tests is great and writing them is super-easy. But as a result, we have way too many of them--
> over 250. These tests run against all runners, and even when parallelized we see Dataflow post-commit times exceeding
> 3-5 hours [3].
> 
> When reading through these tests, we found many of them don't actually exercise runner-specific behavior, and were
> simply using the TestPipeline framework to validate SDK components. This is a valid pattern, but tests should be
> annotated with @NeedsRunner instead. With this annotation, the tests will run on only a single runner, currently
> DirectRunner.
> 
> So, PR/5218 looks at all existing @ValidatesRunner tests and conservatively converts tests which don't need to
> validate all runners into @NeedsRunner. I've also sharded out some very large test classes into scenario-based sub-
> classes. This is because Gradle parallelizes tests at the class-level, and we found a couple very large test classes
> (ParDoTest) became stragglers for the entire execution. Hopefully Gradle will soon implement dynamic splitting :)
> 
> So, the action I'd like to request from others:
> 1) If you are an author of @ValidatesRunner tests, feel free to look over the PR and let me know if I missed anything.
> Kenn Knowles is also helping out here.
> 2) If you find yourself writing new @ValidatesRunner tests, please consider whether your test is validating runner-
> provided behavior. If not, use @NeedsRunner instead.
> 
> 
> [1] https://github.com/apache/beam/pull/5218
> [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunne
> r.java 
> [3] https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend 
--=-Vb9n+PzI+WJmK13+d6h8
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html><head></head><body><div>Scott, thanks for that !</div><div><br></div>=
<div>I only quickly looked at the ValidatesRunner tests that I wrote (you m=
odified none) and the ones that impact my ongoing work (metrics).&nbsp;</di=
v><div>I think some tests in MetricsTest still need to be ValidatesRunner t=
ests. See my comment in the review.</div><div><br></div><div>Etienne</div><=
div><br></div><blockquote type=3D"cite"><div dir=3D"ltr"><div>Note: if you =
don't care about Java runner tests, you can stop reading now.<br></div><div=
><br></div><div>tl;dr: I've made a pass over all @ValidatesRunner tests in =
pr/5218 [1] and converted many to @NeedsRunner in order to reduce post-comm=
it runtime.</div><div><br></div>This is work that was long overdue and fina=
lly got my attention due to the Gradle migration. As context, @ValidatesRun=
ner [2] tests construct a TestPipeline and exercise runner behavior through=
 SDK constructs. The tests are written runner-agnostic so that they can be =
run on and validate all supported runners.<div><br></div><div>The framework=
 for these tests is great and writing them is super-easy. But as a result, =
we have way too many of them-- over 250. These tests run against all runner=
s, and even when parallelized we see Dataflow post-commit times exceeding 3=
-5 hours [3].</div><div><br></div><div>When reading through these tests, we=
 found many of them don't actually exercise runner-specific behavior, and w=
ere simply using the TestPipeline framework to validate SDK components. Thi=
s is a valid pattern, but tests should be annotated with @NeedsRunner inste=
ad. With this annotation, the tests will run on only a single runner, curre=
ntly DirectRunner.</div><div><div><br></div><div>So, PR/5218 looks at all e=
xisting @ValidatesRunner tests and conservatively converts tests which don'=
t need to validate all runners into @NeedsRunner. I've also sharded out som=
e very large test classes into scenario-based sub-classes. This is because =
Gradle parallelizes tests at the class-level, and we found a couple very la=
rge test classes (ParDoTest) became stragglers for the entire execution. Ho=
pefully Gradle will soon implement dynamic splitting :)</div><div><br></div=
><div>So, the action I'd like to request from others:</div><div>1) If you a=
re an author of @ValidatesRunner tests, feel free to look over the PR and l=
et me know if I missed anything. Kenn Knowles is also helping out here.</di=
v><div>2) If you find yourself writing new @ValidatesRunner tests, please c=
onsider whether your test is validating runner-provided behavior. If not, u=
se @NeedsRunner instead.</div><div><br></div><div><br></div><div>[1]&nbsp;<=
a href=3D"https://github.com/apache/beam/pull/5218">https://github.com/apac=
he/beam/pull/5218</a></div></div><div>[2]&nbsp;<a href=3D"https://github.co=
m/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/=
testing/ValidatesRunner.java">https://github.com/apache/beam/blob/master/sd=
ks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java=
</a>&nbsp;</div><div>[3]&nbsp;<a href=3D"https://builds.apache.org/job/beam=
_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend">https://bu=
ilds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/bu=
ildTimeTrend</a>&nbsp;</div></div>
</blockquote></body></html>
--=-Vb9n+PzI+WJmK13+d6h8--