# commons-dev mailing list archives

##### Site index · List index
Message view
Top
From Phil Steitz <p...@steitz.com>
Subject Re: [math] Questions regarding probability distributions
Date Wed, 13 Oct 2004 00:39:36 GMT
```Thanks for the feedback and the contribution!

frno@bredband.net wrote:
> Hi
>
> I've been looking around for open source mathematical statistics software in
> Java in the last couple of weeks and have tested the commons-math package.
> After looking into it I have a few questions and comments.
>
> 1. The MathUtils.factorial(final int n) method throws an
> IllegalArgumentException for n=0 which is clearly incorrect since 0!=1 by
> definition.

Some may disagree with that definition (really amounts to a convention on
how to handle vacuuous products); but I think that I agree with you.
Unless others object, I will consider this a bug and make the change.
Thanks for pointing this oddity out. Would you mind opening a bugzilla
ticket to track this?
<http://issues.apache.org/bugzilla/enter_bug.cgi?product=Commons&component=Math>

>
> 2. I think the distinction between discrete and continuous probability
> distributions manifested in the interfaces DiscreteDistribution and
> ContinuousDistribution is rather questionable.

Here I don't think I agree.  Continuous and discrete distributions are
fundamentally different. See below.

> There is no requirement that
> a discrete distribution only takes on integer values so the methods of
> interface DiscreteDistribution doesn't cover all discrete distributions.

Technically, you are correct - a discrete random variable can take on any
(countable) set of values. The key difference, however, is that for a
discrete random variable X, if x is one of the values that it takes with
positive probability, p(X=x) is non-zero. Therefore, it makes sense to
have probability(x) defined (as it is) in the DiscreteDistribution
interface, but absent from the ContinuousDistribution interface.  More
care also needs to be taken in the discrete case to define and implement
p(a < X < b) type quantities including or excluding endpoints
appropriately (since it makes a difference whether or not endpoints are
included).

In fairness, the DiscreteDistribution interface only supports
integer-valued distributions. The only ones that I have ever used are
integer-valued, and any odd countable set can be mapped into the integers,
so I don't personally view this as a serious limitation.

On
> the other hand, all of the methods of ContinuousDistribution holds equally
> well for a discrete probability distribution.

With slightly different semantics, as noted above.

In my opinion, a more
> appropriate approach would be to rename ContinuousDistribution to
> ProbabilityDistribution and drop the DiscreteDistribution interface. Of
> course, it could be practical to have  convenience methods that takes integer
> arguments for the probability densities for certain distributions but then
> you can define a new interface IntegerValuedDistribution like
>
> public interface IntegerValuedDistribution extends ProbabilityDistribution  {
>     double probability(int x);
>     double cumulativeProbability(int x) throws MathException;
> }

I see your point here and it is legitimate.  I am -0 to making the change,
however, since the interface contract (for the base interface) would in
practical terms be different for the discrete case, so I think it is
better to keep them separate.

>
> 3. Wouldn't it be nice to have convenience methods for calculating
> the moments for each distribution? Something along the lines of
> public double getMoment(int order) throws MomentDoesNotExistException;
>
> for calculating the moments EX^order (if they exist)?
> At least there could be methods for obtaining the mean and variance of a
> distribution.

Yes!!  We thought about adding these to the interfaces but were concerned
that in some cases they would be hard to implement (or impossible -- if
they don't exist ;-)  There is nothing preventing us from adding them to
the implementations, however, or adding another interface or utilities to
compute them. Does anyone see a reason that we need to add them now,
before releasing 1.0?

> 4.  Since the chi-squared and exponential distributions are just special cases
> of the gamma distribution, there is no need to have separate implementation
> classes for these. In my opinion, one should avoid having multiple
> implementations of the same distribution unless there is some strong reason
> for it.

Well, the chi-square implementation wraps a gamma instance.  The
exponential implementation computes the exponential density explicitly.
In any case, the distributions are different (though related) and
sufficiently useful in their own rights that exposing them separately will
be easier for users.

>
> 5. There are quite a lot of elementary distributions missing.
> I wrote an implementation of the Poisson distribution while testing the
> package and have attached the files for it.

Yes!! That is why the framework was designed to support adding
distributions. Could you open a bugzilla ticket and attach the files
there?  I will review and get these in ASAP.

Thanks again for the feedback and contribution.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org