commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [MATH] Test failure in Continuum
Date Tue, 07 Aug 2012 16:36:44 GMT
On 8/6/12 11:16 PM, Dennis Hendriks wrote:
> See below.
>
> On 08/06/2012 05:29 PM, Phil Steitz wrote:
>>
>>
>>
>>
>> On Aug 6, 2012, at 6:06 AM, Dennis Hendriks<D.Hendriks@tue.nl> 
>> wrote:
>>
>>> See below.
>>>
>>> Dennis
>>>
>>>
>>> On 08/06/2012 02:48 PM, Phil Steitz wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Aug 5, 2012, at 11:21 PM, Dennis
>>>> Hendriks<D.Hendriks@tue.nl>   wrote:
>>>>
>>>>> See below.
>>>>>
>>>>> On 08/06/2012 12:49 AM, Gilles Sadowski wrote:
>>>>>> On Sun, Aug 05, 2012 at 12:54:11PM -0700, Phil Steitz wrote:
>>>>>>> On 8/4/12 10:57 AM, Gilles Sadowski wrote:
>>>>>>>> Hello.
>>>>>>>>
>>>>>>>> Referring to this failed test (cf. messages from Continuum):
>>>>>>>> ---CUT---
>>>>>>>> org.apache.commons.math3.exception.NumberIsTooLargeException:
>>>>>>>> lower bound (65) must be strictly less than upper bound (65)
>>>>>>>>     at
>>>>>>>> org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:73)
>>>>>>>>     at
>>>>>>>> org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:53)
>>>>>>>>     at
>>>>>>>> org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.generatePartition(AggregateSummaryStatisticsTest.java:275)
>>>>>>>>     at
>>>>>>>> org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.testAggregationConsistency(AggregateSummaryStatisticsTest.java:89)
>>>>>>>>
>>>>>>>>
>>>>>>>> It is due to a precondition check while creating the
>>>>>>>> "UniformIntegerDistribution" instance:
>>>>>>>> ---CUT---
>>>>>>>> if (lower>= upper) {
>>>>>>>>      throw new NumberIsTooLargeException(
>>>>>>>>                        
>>>>>>>> LocalizedFormats.LOWER_BOUND_NOT_BELOW_UPPER_BOUND,
>>>>>>>>                         lower, upper, false);
>>>>>>>> }
>>>>>>>> ---CUT---
>>>>>>>>
>>>>>>>> The test referred to above was using this code (before I
>>>>>>>> changed it use a
>>>>>>>> "UniformIntegerDistribution" instance):
>>>>>>>> ---CUT---
>>>>>>>> final int next = (i == 4 || cur == length - 1) ? length -
1
>>>>>>>> : randomData.nextInt(cur, length - 1);
>>>>>>>> ---CUT---
>>>>>>>>
>>>>>>>> It is now (after the change):
>>>>>>>> ---CUT---
>>>>>>>> final IntegerDistribution partitionPoint = new
>>>>>>>> UniformIntegerDistribution(cur, length - 1);
>>>>>>>> final int next = (i == 4 || cur == length - 1) ? length -
1
>>>>>>>> : partitionPoint.sample();
>>>>>>>> ---CUT---
>>>>>>>>
>>>>>>>> Thus, AFAIK, the failure did not appear before because
>>>>>>>> there was no
>>>>>>>> precondition enforcement in "nextInt".
>>>>>>>>
>>>>>>>> The question is: Was the code in the test correct (in
>>>>>>>> allowing the same
>>>>>>>> value for both bounds?
>>>>>>>>   * In the negative, how to change it?
>>>>>>>>   * The affirmative would mean that the precondition check
in
>>>>>>>>     "UniformIntegerDistribution" should be relaxed to allow
>>>>>>>> equal bounds.
>>>>>>>>     Does this make sense?
>>>>>>>>     If so, can we change it now, or is it forbidden in
>>>>>>>> order to stay
>>>>>>>>     backwards compatible?
>>>>>>>
>>>>>>> Your analysis above is correct.  The failure after the
>>>>>>> change is due
>>>>>>> to the fact that post-change the distribution is
>>>>>>> instantiated before
>>>>>>> the bounds check.  I changed the test to fix this.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>>   Both the
>>>>>>> randomData nextInt and the UniformIntegerDistribution
>>>>>>> constructor
>>>>>>> now forbid the degenerate case where there is only one point
>>>>>>> in the
>>>>>>> domain.  In retrospect, I guess it would have probably been
>>>>>>> better
>>>>>>> to allow this degenerate case.  Unfortunately, this would be
an
>>>>>>> incompatible change, so will have to wait until 4.0 if we
>>>>>>> want to do it.
>>>>>>>
>>>>>>> The original code above illustrates the convenience of being
>>>>>>> able to
>>>>>>> just make direct calls to randomData.nextXxx, which is why this
>>>>>>> class exists ;)
>>>>>>
>>>>>> As I wrote in another post, I'm not against the convenience
>>>>>> methods. But
>>>>>> IMO, they should be located in a new "DistributionUtils" class.
>>>>>> And we should also find a way to remove the code duplication
>>>>>> (in the
>>>>>> distribution's "sample()" method and in the corresponding
>>>>>> "next..." method).
>>>>>>
>>>>>
>>>>> The RandomData class (or whatever it would be called) does
>>>>> indeed seem useful. If we plan to keep it, we should probably
>>>>> make sure that there is a sample/next/... method in that class
>>>>> for EVERY distribution, as some of them are missing, if I
>>>>> remember correctly. Perhaps this is a separate issue though?
>>>>>
>>>>
>>>> All have the method now, but the impls delegate to
>>>> RandomDataImpl.  In some cases, there is nothing better
>>>> implemented than just inversion, provided by the default
>>>> inversion sampler.  That is OK.  What we need to do is just
>>>> move the implementations of the default and specialized
>>>> samplers to the actual distribution classes.  These can't be
>>>> static, as they use the RamdomData instance.  I will take care
>>>> of this.
>>>
>>> I'm not sure if I made myself clear. I meant to say that not all
>>> distributions have a corresponding nextX method in
>>> RandomData(Impl). What I propose is to make sure that for every
>>> distribution class, there is a corresponding method in
>>> RandomData(Impl) to make sure that RandomData(Impl) is actually
>>> a substitute for using the distributions.
>>
>> I get it now, but I don't think I agree.  RandomData is meant to
>> be a general-purpose  class used for generating random data with
>> commonly desired characteristics, like coming from commonly used
>> distributions such as Poisson, Gaussian, etc.  This class
>> predates the sample() method that has been added to *all*
>> distributions.  The default implementation of sampling for real
>> distributions (inversion-based sampling)  has no dependency on
>> RandomDataImpl, but the specialized implementations (impls that
>> are better than inversion) for some distributions still live
>> there.  Here is what I would like to do:
>>
>> 1. Move the specialized implementations from RandomDataImpl to
>> the distributions that they sample from.
>>
>> 2.  Rename RandomDataImpl to merge the impl and the interface
>>
>> I think we all agree on 1.  Regarding 2, I would personally
>> prefer to leave the represented distributions as is, sticking
>> with just the most commonly used distributions, having the
>> methods accept parameters, but maintaining generators as instance
>> variables so generator state can be maintained between calls
>> (like sample() and the existing RandomDataImpl behave now).  If
>> others feel strongly that every distribution should be
>> represented, I am OK with that but would see it as clutter in the
>> RD replacement.
>
> I definitely do agree on 1. I also agree on 2 (the rename/merge).
> However, I thought the goal of having RandomData is to be convenient:
>
>  - Distributions require a separate instance per distributions.
> RandomData requires only a single instance.
>
>  - Distributions require a separate instance per combination of
> parameters. RandomData allows providing the parameters when one
> actually asks for samples.
>
> If so, then I don't see why that would be the case for only some
> of the distributions, and not for others...? Am I missing
> something here?

Only that adding every distribution would clutter the class, IMO. 
As I said above, RandomData was originally intended to be just a
sort of souped-up Random, making the underlying PRNG pluggable and
supporting more kinds of random data generation than what Random
provides.  If you feel strongly that there are lots of practical
applications for generating random data from, say Weibull or
ChiSquare distributions, I am fine adding them all to the RandomData
replacement.  The rationale for limiting is that it makes it a
little easier to find the commonly used ones (exponential, poissson,
gaussian).  If others disagree, I am OK adding them all (as I see
they have now been added to RandomDataImpl, but not the interface).

Phil
>
> Dennis
>
>
>> Phil
>>
>>
>>>
>>> Obviously, the implementation should be only in the
>>> distributions OR in the RandomDataImpl. I agree that in the
>>> distributions would be better. I like the static methods in
>>> distributions idea.
>>>
>>>> Phil
>>>>
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message