lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <>
Subject Re: Random testing performance (theoretical)
Date Sun, 24 Jun 2012 18:46:23 GMT
Thanks Uwe. I think that guy actually refers to randomizing parameter/
variable space for testing, not generating tests at random in general.
I agree with you that any type of generational/ mutational/
metaheuristic-driven tests generation is not a good way to go in
practice. Lucene tests still have a very broad search space, even with
just randomized parameters/ data and for this that blog post would
apply (?).

Anyway, his is one argument in a broader discussion. I don't know
which theoretical approach would be ideal but I know for sure even
typical randomization of parameters reveals bugs pretty damn


On Sun, Jun 24, 2012 at 5:53 PM, Uwe Schindler <> wrote:
> Hi Dawid,
> I think we do not do completely random testing at all! Random testing (with the limits
mentioned in the article) would mean that we do very random testing with also randomly generated
tests. In contrast our tests all have a non-random background, means our tests are written
with a specific use-case in mind (there are some exemptions, I agree, see below). So with
any random parameter they basically test what a non-random junit test would also test. In
most cases the randomness is only added on top (to extend the space of parameters we test)
- so the chances to find bugs get greater. If we would test Lucene only with also randomly
generated tests, I agree we would find no bugs at all. On the other hand, classical junit
tests have no randomness at all, they just test that a specific "requirement" is implemented
correctly and returns the correct result. The problem with those tests is the fact, that by
fixing a bug to do the correct thing in test can break something else (this is especially
important for IR tests, they rely on statistics!!!). You could write a test that expects a
specific output of a lucene query and you can of course fix all the scoring logic to exactly
create that result. Of course any other similar query can return bogus results. So by executing
random queries, we can also test statistical things like ranking function works also for a
larger set of queries or larger number of documents.
> There are some exceptions: The famous TestRandomChains falls exactly in the category
you mention: They just build a random chain of analyzer components and the number of combinations
you could create is huge! With a few Tokenizers and a list of 150 TokenFilters you can create
an insane amount of TokenStream combinations (just start to multiply..., it is the formula
for the binomial coefficient). In that case, the chance to find a new bug is really small.
The number of failures at the beginning really shows that the TokenStreams were in a very
bad state! Now we have only few failures (one more today!), but I am sure there are still
billions (ore more) of combinations that break the TokenStream contracts!
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> eMail:
>> -----Original Message-----
>> From: [] On Behalf Of
>> Dawid Weiss
>> Sent: Sunday, June 24, 2012 3:42 PM
>> To:
>> Cc:
>> Subject: Random testing performance (theoretical)
>> An interesting link was given to me today:
>> "The Central Limit Theorem Makes Random Testing Hard"
>> Just recently I had a talk at a local university
>> ( and the low theoretical
>> probability of hitting _any_ sensible bug was also raised as a concern. This is in
>> stark contrast to what we see in practice, isn't it? It is an interesting
>> phenomenon because it means that (among other possible explanations) the
>> parameter/ search space of random tests is just ridden with failures and
>> exceptions and bugs and even with close-to-zero probability we still hit a lot of
>> them :)
>> Dawid
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: For additional
>> commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message