commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilles (JIRA)" <>
Subject [jira] [Commented] (MATH-1154) Statistical tests in stat.inference package are very slow due to implicit RandomGenerator initialization
Date Wed, 08 Oct 2014 11:16:33 GMT


Gilles commented on MATH-1154:


bq. I mean, seriously, I am fed up with discussions like this.

Same here (because it takes at least two to discuss).
You did what you did, then ignored my suggestions, then rephrased your comments to look like
a confrontation rather than a different proposal to handle _this_ issue (which you did not
even consider).  Please learn how to read:  In my *first* comment, I gave my opinion on the
part of Otmar's patch that provides a new feature (delegating RNG), and in my *second* comment,
I indirectly acknowledged that your patch was fine to fix _this_ issue. I only advocated to
not mix different (IMHO) things here (thus, implicitly agreeing, again, that other issues
should be better discussed on the ML).
Rather than repeating what you did, you could have made a mental "diff" with what I actually
wrote, and you'd have perhaps seen that there wasn't such a big difference.  At the very least,
none that warranted this outburst of resentment.


bq. the root issue really is MATH-1124

I don't think so, as I've explained above. Unless I'm mistaken, Otmar's proposal allows to
keep all fileds "final" in the distribution classes:  From their perspective, the RNG is a
black box, and whether an implementation uses lazy initialization (thus whether _its_ fields
are "final" or not) is totally irrelevant.

bq. try to reduce the initialization cost of the default

That's what Otmar's proposal (i.e. the new "LazyInitRNG") does.

bq. The OPs patch adds complexity, IMO, without really addressing the core problem

Then I don't know what the "core problem" is.

bq. moving back to MATH-1124

That issue is named "Instances of AbstractRealDistribution require a random generator", and
you yourself put an end to it by making clear that the statement is false.
As said by you (on MATH-1224), and by Thomas (and me) here, this issue is fixed by setting
the RNG argument to "null". Why do you call that a "smelly workaround" whereas it's a quite
clear and legitimate assessement on the caller's part?
Do we talk about "aesthetics"? Then, I can certainly agree that (depending on the type of
regular usage one is used to) it might look ugly to often call a constructor with a "null"
I vaguely recall that we discarded a more flexible separation of concerns (through inheritance
and/or additional interfaces) as unnecessarily complex. On the other hand, I recall clearly
that everyone agreed that "sampling" was part of the concept of a distribution.

I do not deny that there is a link with Otmar's proposal, but only in the sense that his new
feature would satisfy users with "mixed" needs (and do not want to use separate instances
for when they need sampling and when they don't).

> Statistical tests in stat.inference package are very slow due to implicit RandomGenerator
> --------------------------------------------------------------------------------------------------------
>                 Key: MATH-1154
>                 URL:
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.3
>            Reporter: Otmar Ertl
>         Attachments: MATH-1154.patch, math3.patch
> Some statistical tests defined in the stat.inference package (e.g. BinomialTest or ChiSquareTest)
are unnecessarily very slow (up to a factor 20 slower than necessary). The reason is the implicit
slow initialization of a default (Well19937c) random generator instance each time a test is
performed. The affected tests create some distribution instance in order to use some methods
defined therein. However, they do not use any method for random generation. Nevertheless a
random number generator instance is automatically created when creating a distribution instance,
which is the reason for the serious slowdown. The problem is related to MATH-1124.
> There are following solutions:
> 1) Fix the affected statistical tests by passing a light-weight RandomGenerator implementation
(or even null) to the constructor of the distribution.
> 2) Or use for all distributions a RandomGenerator implementation that uses lazy initialization
to generate the Well19937c instance as late as possible. This would also solve MATH-1124.
> I will attach a patch proposal together with a performance test, that will demonstrate
the speed up after a fix.

This message was sent by Atlassian JIRA

View raw message