commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] Distributions over sample spaces other than R
Date Tue, 01 Nov 2011 17:23:45 GMT
On 11/1/11 1:05 AM, Mikkel Meyer Andersen wrote:
> 2011/10/30 Phil Steitz <phil.steitz@gmail.com>:
>> On 10/29/11 10:20 AM, cwinter wrote:
>>> Phil Steitz wrote:
>>>> I would say pull DiscreteDistribution out.  That is where the
>>>> difference really lies.  I have thought about suggesting that we
>>>> eliminate it altogether; but I still think there may be value in
>>>> supporting discrete distributions over sample spaces that are not
>>>> embedded in the integers.
>>>>
>>>> Phil
>>>>
>>> Empirical distributions are discrete by nature. Depending on the underlying
>>> data, the domain is usually (a subset of) the reals or the integers.
>>> However, after moving probability(double) to Distribution,
>>> DiscreteDistribution will be an empty interface. Thus there is in fact the
>>> question whether it should be eliminated. Otherwise it would be just a
>>> "flag" for discrete distributions and that's indeed independent of the
>>> sample space.
>> Maybe it would be best to eliminate IntegerDistribution then and
>> merge Distribution and ContinuousDistribution, so we have two roots
>> - DiscreteDistribution and ContinuousDistribution.   The only reason
>> really to have DiscreteDistribution is if we want to support
>> distributions of RVs over sample spaces that are not subsets of Z.
>> There does not seem to be much enthusiasm for that (i.e.
>> parameterizing the type of the domain of the distribution and pmf),
>> so we might as well just collapse Discrete and Integer.   Once you
>> pull out Discrete/Integer, there is not much value any more in
>> Distribution as a parent, so why not just drop both
>> IntegerDistribution and Distribution and move to two roots with
>> doubles / ints as domains and contracts cleaned up to deal with
>> discrete vs continuous cases consistently.
> If we have these two roots, I would propose an Distribution interface
> with e.g. with cdf and inverse cdf. Alternatively, an abstract class
> implementing a default solver for the inverse cdf. We might be able to
> make this generic parameterising the argument to cdf and others.

The only way for this to work is to parameterize the type of the
sample space, which will then force Double to be used for the
continuous case.  Why is it so bad to have two roots?

What exactly do we gain by having the common parent?  The inverse
cdf machinery will work only for the continuous (by that, I mean
real-valued RV) case.  Note how it is overriden now in
AbstractIntegerDistribution.  So why not just leave that alone and
separate out the discrete/integer/whatever-we-want-to-call-it case? 
Discrete distributions are fundamentally different.  They have
pmfs.  They have discrete value sets.  Inversion works differently. 
Inequalities work differently.  Why not just cleanly separate? 

Phil
>
> In my opinion, we benefit from having one such common ancestor instead
> of two "independent" linages.
>
>> Phil
>>> Christian
>>>
>>> --
>>> View this message in context: http://apache-commons.680414.n4.nabble.com/math-Distributions-over-sample-spaces-other-than-R-tp3931349p3951273.html
>>> Sent from the Commons - Dev mailing list archive at Nabble.com.
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message