commons-dev mailing list archives

Site index · List index
Message view
Top
From Phil Steitz <p...@steitz.com>
Subject Re: [math] inverseCumulativeProbability definition for discrete distributions
Date Sun, 09 May 2004 14:51:44 GMT
```J.Pietschmann wrote:
> Phil Steitz wrote:
>
>> DiscreteDistribution.inverseCumulativeProbability says
>>
>> "For this disbution, X, this method returns x such that P(X <= x) <= p."
>>
>> I think it say that it returns the *largest* x such that P(X <= x) <= p.
>
>
> This only matters for cases where P(X <= x) is not strongly increasing
> (it's an increasing function by definition). For the majority of
> continuous distributions, P(X <= x) is strongly increasing anyway, but
> you are right, some clarification wouldn't hurt.

I was referring to the discrete case above, where you need to make a
choice as to the definition (either max {x: P(X <= x) <= p} or
min {x: P(X <= x) >= p}). If you choose the first definition, then for p =
1, the value is undefined, unless p = 1 is treated specially (see below).

>
>> Assuming the above definition, the method should be undefined for p =
>> 1. I would like to change the javadoc as above and modify the guard to
>> throw IllegalArgumentException when p = 1.  Any objections?
>
>
> If it's documented, the function can return the lowest x for which
> P(X <= x)=1. I think this could be of interest sometimes. Well,
> for cases where P(X <= x) is strongly increasing, the function
> will throw an overflow for p<eps1 and p>1-eps2 anyway.

This amounts to the second definition above in the discrete case and if we
did this, we might want to use that definition uniformly, which would
change current behavior. My preference here is to use the first definition
and treat p = 1 specially, returning the critical value in the finite
support case and throwing a MathException in the unbounded case. In any
case, we need to make the definition precise and make sure that the
implementation matches it.

For the continuous case, as you point out above, the two definitions will
usually agree (we might want to make this a requirement for continuous
distributions).  In this case, p = 1 presents a different problem.  For
distributions with unbounded support, we should return
Double.POSITIVE_INFINITY, Double.NaN or throw MathException.  Currently,
the continuous distributions that we have implemented look like they are
trying to return Double.MAX_VALUE, which is practically appealing (viewed
from the left ;-) but arguably incorrect mathematically (when I say
"trying to" I mean that the algorithm should return that, but numerical
complications lead to values like 14.5 returned for N(2.1, 1.4)). My vote
here is to allow p = 1 for the continuous case, returning the critical
value when support is bounded and throwing MathException when unbounded
(since the inverse cdf should return something in the domain of the cdf).
I would also be OK leaving as is, but documenting the fact that for p =
1, when support is unbounded, what is returned is the largest x such that
p(X < x) is not distinguishable from 1. Here again, the important thing is
to make a choice and document the behaviour.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org