commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <>
Subject [jira] [Commented] (MATH-1154) Statistical tests in stat.inference package are very slow due to implicit RandomGenerator initialization
Date Tue, 07 Oct 2014 15:52:34 GMT


Phil Steitz commented on MATH-1154:

I agree that the root issue really is MATH-1124.  When we decided to move sampling into the
distributions we created the need for distribution instances to have access to a PRNG.  When
we decided we wanted everything to be final we forced ourselves into the MATH-1124 state,
where the only way to avoid potentially expensive PRNG initialization when creating distribution
instances that may never use sampling is the smelly workaround to null out the (final) PRNG
at instance construction time.  Thomas' patch looks fine to me and unless and until we change
one of the decisions above (relax final obsession or pull sampling back out), we should use
the workaround in the unit tests (as the patch does) and try to reduce the initialization
cost of the default or find a better workaround (reopening MATH-1124).   The OPs patch adds
complexity, IMO, without really addressing the core problem, which I think we should address
in MATH-1124.  So I am +1 to applying Thomas' patch, resolving this issue and moving back
to MATH-1124.

> Statistical tests in stat.inference package are very slow due to implicit RandomGenerator
> --------------------------------------------------------------------------------------------------------
>                 Key: MATH-1154
>                 URL:
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.3
>            Reporter: Otmar Ertl
>         Attachments: MATH-1154.patch, math3.patch
> Some statistical tests defined in the stat.inference package (e.g. BinomialTest or ChiSquareTest)
are unnecessarily very slow (up to a factor 20 slower than necessary). The reason is the implicit
slow initialization of a default (Well19937c) random generator instance each time a test is
performed. The affected tests create some distribution instance in order to use some methods
defined therein. However, they do not use any method for random generation. Nevertheless a
random number generator instance is automatically created when creating a distribution instance,
which is the reason for the serious slowdown. The problem is related to MATH-1124.
> There are following solutions:
> 1) Fix the affected statistical tests by passing a light-weight RandomGenerator implementation
(or even null) to the constructor of the distribution.
> 2) Or use for all distributions a RandomGenerator implementation that uses lazy initialization
to generate the Well19937c instance as late as possible. This would also solve MATH-1124.
> I will attach a patch proposal together with a performance test, that will demonstrate
the speed up after a fix.

This message was sent by Atlassian JIRA

View raw message