crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-542) Wider tolerance for flaky scrunch PCollectionTest
Date Thu, 16 Jul 2015 06:36:04 GMT


Gabriel Reid commented on CRUNCH-542:

FWIW, I think that just using the seeded version of the test is fine (that's what is done
in o.a.c.lib.SampleTest). Checking that it's within 5 standard deviations isn't that far away
from not checking it at all isn't it?

Another option might be to do three un-seeded calls to sample and then calculate the average.

> Wider tolerance for flaky scrunch PCollectionTest
> -------------------------------------------------
>                 Key: CRUNCH-542
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Scrunch
>    Affects Versions: 0.10.0, 0.11.0, 0.12.0
>            Reporter: Josh Wills
>            Priority: Minor
>             Fix For: 0.13.0
>         Attachments: CRUNCH-542.patch
> One of the Scrunch tests uses an unseeded version of the sample() function that verifies
that it works correctly by ensuring that an actual sampling of elements is within ~ 3 standard
deviations of the expected value. Given this, we expect the test to fail about once every
370 times it is run, or once a year if the tests were run every day.
> My issue is that we test about a dozen versions of Crunch automatically in Jenkins every
day, and so I'm having this test fail on at least some version about once every month. I'd
like to bump the control limit up to a little over 5 standard deviations so that the test
fails around once every millennium and/or get rid of the test entirely and only rely on the
seeded versions of the test.

This message was sent by Atlassian JIRA

View raw message