commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikkel Meyer Andersen <m...@mikl.dk>
Subject Re: [math] EmpiricalDistribution
Date Wed, 07 Sep 2011 18:20:43 GMT
2011/9/7 Phil Steitz <phil.steitz@gmail.com>:
> On 9/6/11 8:58 AM, Mikkel Meyer Andersen wrote:
>> 2011/9/6 Phil Steitz <phil.steitz@gmail.com>:
>>> On 9/6/11 12:00 AM, Mikkel Meyer Andersen wrote:
>>>> 2011/9/5 Phil Steitz <phil.steitz@gmail.com>:
>>>>> I have a couple of proposals for this class:
>>>>>
>>>>> 0) Merge the interface and impl.   This is consistent with what we
>>>>> are doing in some other places where we have only one implementation.
>>>> Fine with me.
>>>>> 1) Extend this class to actually provide a distribution - i.e.
>>>>> implement the Distribution interface.
>>>> Won't we have problems, e.g. with implementing cumulativeProbability?
>>> The idea I had was to interpolate within bins.  So to compute the
>>> cdf at x you would find its bin, sum the mass (based on number of
>>> original sample points contained, like the sampling does) of the
>>> bins below its containing bin and then use the defined kernel within
>>> bin to determine how much of its own bin's mass to include.
>> Seems reasonable. But: We might want to include a user specified
>> support - just simple (endpoints of an interval) - or else the highest
>> and lowest value specifies the support which might not be a good idea.
>
> By the latter, do you mean just interpolate linearly between lowest
> and highest, or do you mean the lowest / highest actually observed
> points in the bin?  The first is like using a uniform kernel in the
> bins.  By "user-specified support" I guess you mean make the
> interpolation strategy pluggable somehow, right?   What launched me
> into thinking about making the kernel used for sampling configurable
> was thinking about how uniform would probably be better / more
> defensible for use interpolating the cdf in some cases.  Then you
> have to ask is it OK to use a different kernel for the sampling vs
> cdf computation.  My instinct is to say no and keep it simple -
> allow a uniform kernel to be chosen in place of the hard-coded
> Gaussian there now and then use the configured kernel for both
> sampling and cdf computation.  Even with mixed kernels, you will
> probably in most cases end up with decent fidelity between sampling
> results and the cdf; but I can imagine scenarios where Gaussian
> kernels with coarse grids could lead to funny sampling distributions
> that would not follow the linearly-interpolated cdf very well near
> grid points.
>
> Phil
"but I can imagine scenarios where Gaussian
kernels with coarse grids could lead to funny sampling distributions
that would not follow the linearly-interpolated cdf very well near
grid points."
Yes, precisely. Especially if trying to distribute the probability
mass on a discrete grid :-).

To clearify what I ment by user-specified support:
If a user has observations 1, 3, 4, we would probably want to open up
for probability mass elsewhere than just at {1, 2, 3, 4} (2 is
interpolated). Then I mean that it might make sense that the user can
specify that that the distribution is discrete with a support of {0,
1, 2, 3, 4, 5} (2 is interpolated and 0/5 interpolated). Similar for
continuous distributions.

Of is that too ambitious?

Regarding kernels, I'm okay with only supporting uniform and Gaussian,
but we might think about it - we might come up with a clever solution
giving pluggable kernels almost for free (if we are lucky :-)).

Cheers, Mikkel.

>>>>> 2) make the kernel used within bins configurable.  Currently, values
>>>>> are generated (and the cdf would be computed) assuming a Gaussian
>>>>> distribution within bins.  I think at least a uniform option should
>>>>> be provided.
>>>> +1, maybe it can be generalised to providing user-defined kernels.
>>> Good idea.  Need to think about how to enable that.
>>>
>>> Thanks!
>>>
>>> Phil
>>>>> Thanks in advance for any feedback on this or further suggestions
>>>>> for improvement.
>>>>>
>>>>> Phil
>>>>>
>> Cheers, Mikkel.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message