commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (RNG-52) PoissonSampler allows mean above Integer.MAX_VALUE
Date Mon, 24 Sep 2018 10:38:00 GMT

    [ https://issues.apache.org/jira/browse/RNG-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625626#comment-16625626
] 

Alex D Herbert commented on RNG-52:
-----------------------------------

In short, I agree.

Here're my thoughts:

The fix RNG-51 added a check and Javadoc that a mean above {{Integer.Max_VALUE}} is invalid.
However the {{SmallMeanPoissonSampler}} has no check and the limit of {{Integer.MAX_VALUE}}
does cause truncation.

It should be noted that the convolution of two Poisson distributions of means {{a}} and {{b}}
is a Poisson distribution of mean {{a+b}}, see:

[Convolution of Probability Distributions|https://en.wikipedia.org/wiki/List_of_convolutions_of_probability_distributions]

This is how the {{LargeMeanPoissonSampler}} is able to cache pre-computed state. It caches
state for the mean converted to an integer and when sampling adds a sample from a Poisson
distribution using the remaining mean fraction.

So in theory any large limit could be chosen and specified in the Javadoc. It could then be
stated that to obtain a Poisson sample for a mean larger than the limit requires obtaining
samples from distributions with means below the limit using two or more means that sum to
the desired mean and summing the resulting sample values.

The standard deviation for a Poisson distribution with mean {{0.5 * Integer.MAX_VALUE}} would
be {{sqrt(0.5 * Integer.MAX_VALUE) = 32768}}. Assuming a Poisson distribution is approximated
by a Gaussian at large mean then the upper range of the sample ({{Integer.MAX_VALUE}}) is
over 30000 SD units above the mean. The likelihood of this is probably beyond the limit that
can be computed using {{double}} precision. So {{0.5 * Integer.MAX_VALUE}} is a very conservative
upper limit. However if a user of the library hits this level and gets an exception it should
give them pause for thought about what they are trying to achieve in their use case.

All this could be added to the Javadoc.

 

> PoissonSampler allows mean above Integer.MAX_VALUE
> --------------------------------------------------
>
>                 Key: RNG-52
>                 URL: https://issues.apache.org/jira/browse/RNG-52
>             Project: Commons RNG
>          Issue Type: Bug
>          Components: sampling
>    Affects Versions: 1.1
>            Reporter: Alex D Herbert
>            Priority: Major
>             Fix For: 1.2
>
>
> The {{PoissonSampler}} is limited to returning an integer by the interface of the {{DiscreteSampler}}.
As it stands an input mean above {{Integer.MAX_VALUE}} is allowed although it makes no sense
as the Poisson distribution is significantly truncated.
> The algorithm of the {{SmallMeanPoissonSampler}} sets a limit on the returned sample
of {{Integer.MAX_VALUE}}. The algorithm is valid although run-time would be impractical due
to the nature of the algorithm. However at high mean (>40) the end user is expected to
use either the {{LargeMeanPoissonSampler}} directly or the {{PoissonSampler}} which chooses
the appropriate large mean algorithm.
> However the current {{LargeMeanPoissonSampler}} uses {{(int)Math.floor(mean)}} during
initialisation and any mean above {{Integer.MAX_VALUE}} would therefore be unsupported.
> I propose to add this to the constructor of each Poisson sampler:
> {code:java}
> if (mean > Integer.MAX_VALUE) {
>     throw new IllegalArgumentException(mean + " > " + Integer.MAX_VALUE);
> }
> {code}
> with documentation
> {code:java}
>  * @throws IllegalArgumentException if {@code mean <= 0} or {@code mean > }{@link
Integer.MAX_VALUE}.
> {code}
> It is noted that the limit of {{Integer.MAX_VALUE}} would allow the samples to reflect
the Poisson distribution below that level but truncate it above that level to represent the
remaining cumulative histogram at the single point of {{Integer.MAX_VALUE}}. This maintains
the functionality of the sampler within the contract of the integer value returned by {{DiscreteSampler}}.
> In practice the Poisson distribution is unlikely to be used at such a high mean; in this
case it is appropriate to use a Gaussian approximation to the Poisson.
> Note: Currently there is no code coverage from tests for the \{{LargeMeanPoissonSampler}}
checking if the mean is <= 0. Tests should be added to check the constructor does throw
when a bad mean is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message