commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <p...@steitz.com>
Subject Re: [math] Questions regarding probability distributions
Date Wed, 13 Oct 2004 00:39:36 GMT
Thanks for the feedback and the contribution!

See comments interspersed below.

frno@bredband.net wrote:
> Hi
> 
> I've been looking around for open source mathematical statistics software in 
> Java in the last couple of weeks and have tested the commons-math package.
> After looking into it I have a few questions and comments.
> 
> 1. The MathUtils.factorial(final int n) method throws an 
> IllegalArgumentException for n=0 which is clearly incorrect since 0!=1 by 
> definition.

Some may disagree with that definition (really amounts to a convention on 
how to handle vacuuous products); but I think that I agree with you. 
Unless others object, I will consider this a bug and make the change. 
Thanks for pointing this oddity out. Would you mind opening a bugzilla 
ticket to track this? 
<http://issues.apache.org/bugzilla/enter_bug.cgi?product=Commons&component=Math>

> 
> 2. I think the distinction between discrete and continuous probability 
> distributions manifested in the interfaces DiscreteDistribution and 
> ContinuousDistribution is rather questionable.

Here I don't think I agree.  Continuous and discrete distributions are 
fundamentally different. See below.

> There is no requirement that 
> a discrete distribution only takes on integer values so the methods of 
> interface DiscreteDistribution doesn't cover all discrete distributions.

Technically, you are correct - a discrete random variable can take on any 
(countable) set of values. The key difference, however, is that for a 
discrete random variable X, if x is one of the values that it takes with 
positive probability, p(X=x) is non-zero. Therefore, it makes sense to 
have probability(x) defined (as it is) in the DiscreteDistribution 
interface, but absent from the ContinuousDistribution interface.  More 
care also needs to be taken in the discrete case to define and implement 
p(a < X < b) type quantities including or excluding endpoints 
appropriately (since it makes a difference whether or not endpoints are 
included).

In fairness, the DiscreteDistribution interface only supports 
integer-valued distributions. The only ones that I have ever used are 
integer-valued, and any odd countable set can be mapped into the integers, 
so I don't personally view this as a serious limitation.

  On
> the other hand, all of the methods of ContinuousDistribution holds equally 
> well for a discrete probability distribution.

With slightly different semantics, as noted above.

  In my opinion, a more
> appropriate approach would be to rename ContinuousDistribution to 
> ProbabilityDistribution and drop the DiscreteDistribution interface. Of 
> course, it could be practical to have  convenience methods that takes integer 
> arguments for the probability densities for certain distributions but then 
> you can define a new interface IntegerValuedDistribution like 
> 
> public interface IntegerValuedDistribution extends ProbabilityDistribution  {
>     double probability(int x);
>     double cumulativeProbability(int x) throws MathException;
> }

I see your point here and it is legitimate.  I am -0 to making the change, 
however, since the interface contract (for the base interface) would in 
practical terms be different for the discrete case, so I think it is 
better to keep them separate.

> 
> 3. Wouldn't it be nice to have convenience methods for calculating
> the moments for each distribution? Something along the lines of
> public double getMoment(int order) throws MomentDoesNotExistException;
> 
> for calculating the moments EX^order (if they exist)? 
> At least there could be methods for obtaining the mean and variance of a 
> distribution.

Yes!!  We thought about adding these to the interfaces but were concerned 
that in some cases they would be hard to implement (or impossible -- if 
they don't exist ;-)  There is nothing preventing us from adding them to 
the implementations, however, or adding another interface or utilities to 
compute them. Does anyone see a reason that we need to add them now, 
before releasing 1.0?

> 4.  Since the chi-squared and exponential distributions are just special cases
> of the gamma distribution, there is no need to have separate implementation
> classes for these. In my opinion, one should avoid having multiple 
> implementations of the same distribution unless there is some strong reason 
> for it.

Well, the chi-square implementation wraps a gamma instance.  The 
exponential implementation computes the exponential density explicitly. 
In any case, the distributions are different (though related) and 
sufficiently useful in their own rights that exposing them separately will 
be easier for users.

> 
> 5. There are quite a lot of elementary distributions missing.
> I wrote an implementation of the Poisson distribution while testing the 
> package and have attached the files for it.

Yes!! That is why the framework was designed to support adding 
distributions. Could you open a bugzilla ticket and attach the files 
there?  I will review and get these in ASAP.

Thanks again for the feedback and contribution.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message