commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <p...@steitz.com>
Subject Re: [math] inverseCumulativeProbability definition for discrete distributions
Date Sun, 09 May 2004 14:51:44 GMT
J.Pietschmann wrote:
> Phil Steitz wrote:
> 
>> Currently, the javadoc for 
>> DiscreteDistribution.inverseCumulativeProbability says
>>
>> "For this disbution, X, this method returns x such that P(X <= x) <= p."
>>
>> I think it say that it returns the *largest* x such that P(X <= x) <= p. 
> 
> 
> This only matters for cases where P(X <= x) is not strongly increasing
> (it's an increasing function by definition). For the majority of
> continuous distributions, P(X <= x) is strongly increasing anyway, but
> you are right, some clarification wouldn't hurt.

I was referring to the discrete case above, where you need to make a 
choice as to the definition (either max {x: P(X <= x) <= p} or
min {x: P(X <= x) >= p}). If you choose the first definition, then for p = 
1, the value is undefined, unless p = 1 is treated specially (see below).

> 
>> Assuming the above definition, the method should be undefined for p = 
>> 1. I would like to change the javadoc as above and modify the guard to 
>> throw IllegalArgumentException when p = 1.  Any objections?
> 
> 
> If it's documented, the function can return the lowest x for which
> P(X <= x)=1. I think this could be of interest sometimes. Well,
> for cases where P(X <= x) is strongly increasing, the function
> will throw an overflow for p<eps1 and p>1-eps2 anyway.

This amounts to the second definition above in the discrete case and if we 
did this, we might want to use that definition uniformly, which would 
change current behavior. My preference here is to use the first definition 
and treat p = 1 specially, returning the critical value in the finite 
support case and throwing a MathException in the unbounded case. In any 
case, we need to make the definition precise and make sure that the 
implementation matches it.

For the continuous case, as you point out above, the two definitions will 
usually agree (we might want to make this a requirement for continuous 
distributions).  In this case, p = 1 presents a different problem.  For 
distributions with unbounded support, we should return 
Double.POSITIVE_INFINITY, Double.NaN or throw MathException.  Currently, 
the continuous distributions that we have implemented look like they are 
trying to return Double.MAX_VALUE, which is practically appealing (viewed 
from the left ;-) but arguably incorrect mathematically (when I say 
"trying to" I mean that the algorithm should return that, but numerical 
complications lead to values like 14.5 returned for N(2.1, 1.4)). My vote 
here is to allow p = 1 for the continuous case, returning the critical 
value when support is bounded and throwing MathException when unbounded 
(since the inverse cdf should return something in the domain of the cdf). 
  I would also be OK leaving as is, but documenting the fact that for p = 
1, when support is unbounded, what is returned is the largest x such that
p(X < x) is not distinguishable from 1. Here again, the important thing is 
to make a choice and document the behaviour.


Phil



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message