commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [MATH] Test failure in Continuum
Date Mon, 20 Aug 2012 20:14:22 GMT
On 8/7/12 9:36 AM, Phil Steitz wrote:
> On 8/6/12 11:16 PM, Dennis Hendriks wrote:
>> See below.
>>
>> On 08/06/2012 05:29 PM, Phil Steitz wrote:
>>>
>>>
>>>
>>> On Aug 6, 2012, at 6:06 AM, Dennis Hendriks<D.Hendriks@tue.nl> 
>>> wrote:
>>>
>>>> See below.
>>>>
>>>> Dennis
>>>>
>>>>
>>>> On 08/06/2012 02:48 PM, Phil Steitz wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Aug 5, 2012, at 11:21 PM, Dennis
>>>>> Hendriks<D.Hendriks@tue.nl>   wrote:
>>>>>
>>>>>> See below.
>>>>>>
>>>>>> On 08/06/2012 12:49 AM, Gilles Sadowski wrote:
>>>>>>> On Sun, Aug 05, 2012 at 12:54:11PM -0700, Phil Steitz wrote:
>>>>>>>> On 8/4/12 10:57 AM, Gilles Sadowski wrote:
>>>>>>>>> Hello.
>>>>>>>>>
>>>>>>>>> Referring to this failed test (cf. messages from Continuum):
>>>>>>>>> ---CUT---
>>>>>>>>> org.apache.commons.math3.exception.NumberIsTooLargeException:
>>>>>>>>> lower bound (65) must be strictly less than upper bound
(65)
>>>>>>>>>     at
>>>>>>>>> org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:73)
>>>>>>>>>     at
>>>>>>>>> org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:53)
>>>>>>>>>     at
>>>>>>>>> org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.generatePartition(AggregateSummaryStatisticsTest.java:275)
>>>>>>>>>     at
>>>>>>>>> org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.testAggregationConsistency(AggregateSummaryStatisticsTest.java:89)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It is due to a precondition check while creating the
>>>>>>>>> "UniformIntegerDistribution" instance:
>>>>>>>>> ---CUT---
>>>>>>>>> if (lower>= upper) {
>>>>>>>>>      throw new NumberIsTooLargeException(
>>>>>>>>>                        
>>>>>>>>> LocalizedFormats.LOWER_BOUND_NOT_BELOW_UPPER_BOUND,
>>>>>>>>>                         lower, upper, false);
>>>>>>>>> }
>>>>>>>>> ---CUT---
>>>>>>>>>
>>>>>>>>> The test referred to above was using this code (before
I
>>>>>>>>> changed it use a
>>>>>>>>> "UniformIntegerDistribution" instance):
>>>>>>>>> ---CUT---
>>>>>>>>> final int next = (i == 4 || cur == length - 1) ? length
- 1
>>>>>>>>> : randomData.nextInt(cur, length - 1);
>>>>>>>>> ---CUT---
>>>>>>>>>
>>>>>>>>> It is now (after the change):
>>>>>>>>> ---CUT---
>>>>>>>>> final IntegerDistribution partitionPoint = new
>>>>>>>>> UniformIntegerDistribution(cur, length - 1);
>>>>>>>>> final int next = (i == 4 || cur == length - 1) ? length
- 1
>>>>>>>>> : partitionPoint.sample();
>>>>>>>>> ---CUT---
>>>>>>>>>
>>>>>>>>> Thus, AFAIK, the failure did not appear before because
>>>>>>>>> there was no
>>>>>>>>> precondition enforcement in "nextInt".
>>>>>>>>>
>>>>>>>>> The question is: Was the code in the test correct (in
>>>>>>>>> allowing the same
>>>>>>>>> value for both bounds?
>>>>>>>>>   * In the negative, how to change it?
>>>>>>>>>   * The affirmative would mean that the precondition
check in
>>>>>>>>>     "UniformIntegerDistribution" should be relaxed to
allow
>>>>>>>>> equal bounds.
>>>>>>>>>     Does this make sense?
>>>>>>>>>     If so, can we change it now, or is it forbidden in
>>>>>>>>> order to stay
>>>>>>>>>     backwards compatible?
>>>>>>>> Your analysis above is correct.  The failure after the
>>>>>>>> change is due
>>>>>>>> to the fact that post-change the distribution is
>>>>>>>> instantiated before
>>>>>>>> the bounds check.  I changed the test to fix this.
>>>>>>> Thanks.
>>>>>>>
>>>>>>>>   Both the
>>>>>>>> randomData nextInt and the UniformIntegerDistribution
>>>>>>>> constructor
>>>>>>>> now forbid the degenerate case where there is only one point
>>>>>>>> in the
>>>>>>>> domain.  In retrospect, I guess it would have probably been
>>>>>>>> better
>>>>>>>> to allow this degenerate case.  Unfortunately, this would
be an
>>>>>>>> incompatible change, so will have to wait until 4.0 if we
>>>>>>>> want to do it.
>>>>>>>>
>>>>>>>> The original code above illustrates the convenience of being
>>>>>>>> able to
>>>>>>>> just make direct calls to randomData.nextXxx, which is why
this
>>>>>>>> class exists ;)
>>>>>>> As I wrote in another post, I'm not against the convenience
>>>>>>> methods. But
>>>>>>> IMO, they should be located in a new "DistributionUtils" class.
>>>>>>> And we should also find a way to remove the code duplication
>>>>>>> (in the
>>>>>>> distribution's "sample()" method and in the corresponding
>>>>>>> "next..." method).
>>>>>>>
>>>>>> The RandomData class (or whatever it would be called) does
>>>>>> indeed seem useful. If we plan to keep it, we should probably
>>>>>> make sure that there is a sample/next/... method in that class
>>>>>> for EVERY distribution, as some of them are missing, if I
>>>>>> remember correctly. Perhaps this is a separate issue though?
>>>>>>
>>>>> All have the method now, but the impls delegate to
>>>>> RandomDataImpl.  In some cases, there is nothing better
>>>>> implemented than just inversion, provided by the default
>>>>> inversion sampler.  That is OK.  What we need to do is just
>>>>> move the implementations of the default and specialized
>>>>> samplers to the actual distribution classes.  These can't be
>>>>> static, as they use the RamdomData instance.  I will take care
>>>>> of this.
>>>> I'm not sure if I made myself clear. I meant to say that not all
>>>> distributions have a corresponding nextX method in
>>>> RandomData(Impl). What I propose is to make sure that for every
>>>> distribution class, there is a corresponding method in
>>>> RandomData(Impl) to make sure that RandomData(Impl) is actually
>>>> a substitute for using the distributions.
>>> I get it now, but I don't think I agree.  RandomData is meant to
>>> be a general-purpose  class used for generating random data with
>>> commonly desired characteristics, like coming from commonly used
>>> distributions such as Poisson, Gaussian, etc.  This class
>>> predates the sample() method that has been added to *all*
>>> distributions.  The default implementation of sampling for real
>>> distributions (inversion-based sampling)  has no dependency on
>>> RandomDataImpl, but the specialized implementations (impls that
>>> are better than inversion) for some distributions still live
>>> there.  Here is what I would like to do:
>>>
>>> 1. Move the specialized implementations from RandomDataImpl to
>>> the distributions that they sample from.
>>>
>>> 2.  Rename RandomDataImpl to merge the impl and the interface
>>>
>>> I think we all agree on 1.  Regarding 2, I would personally
>>> prefer to leave the represented distributions as is, sticking
>>> with just the most commonly used distributions, having the
>>> methods accept parameters, but maintaining generators as instance
>>> variables so generator state can be maintained between calls
>>> (like sample() and the existing RandomDataImpl behave now).  If
>>> others feel strongly that every distribution should be
>>> represented, I am OK with that but would see it as clutter in the
>>> RD replacement.
>> I definitely do agree on 1. I also agree on 2 (the rename/merge).
>> However, I thought the goal of having RandomData is to be convenient:
>>
>>  - Distributions require a separate instance per distributions.
>> RandomData requires only a single instance.
>>
>>  - Distributions require a separate instance per combination of
>> parameters. RandomData allows providing the parameters when one
>> actually asks for samples.
>>
>> If so, then I don't see why that would be the case for only some
>> of the distributions, and not for others...? Am I missing
>> something here?
> Only that adding every distribution would clutter the class, IMO. 
> As I said above, RandomData was originally intended to be just a
> sort of souped-up Random, making the underlying PRNG pluggable and
> supporting more kinds of random data generation than what Random
> provides.  If you feel strongly that there are lots of practical
> applications for generating random data from, say Weibull or
> ChiSquare distributions, I am fine adding them all to the RandomData
> replacement.  The rationale for limiting is that it makes it a
> little easier to find the commonly used ones (exponential, poissson,
> gaussian).  If others disagree, I am OK adding them all (as I see
> they have now been added to RandomDataImpl, but not the interface).

In r1375192, I created RandomDataGenerator to replace RandomDataImpl
and included support for all distributions.  I also changed the
versions that were just using nextInversionDeviate to actually use
the configured RandomGenerator.

Phil
>
> Phil
>> Dennis
>>
>>
>>> Phil
>>>
>>>
>>>> Obviously, the implementation should be only in the
>>>> distributions OR in the RandomDataImpl. I agree that in the
>>>> distributions would be better. I like the static methods in
>>>> distributions idea.
>>>>
>>>>> Phil
>>>>>
>>>>>>> Gilles
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message