2011/9/7 Phil Steitz <phil.steitz@gmail.com>:
> On 9/6/11 8:58 AM, Mikkel Meyer Andersen wrote:
>> 2011/9/6 Phil Steitz <phil.steitz@gmail.com>:
>>> On 9/6/11 12:00 AM, Mikkel Meyer Andersen wrote:
>>>> 2011/9/5 Phil Steitz <phil.steitz@gmail.com>:
>>>>> I have a couple of proposals for this class:
>>>>>
>>>>> 0) Merge the interface and impl. This is consistent with what we
>>>>> are doing in some other places where we have only one implementation.
>>>> Fine with me.
>>>>> 1) Extend this class to actually provide a distribution  i.e.
>>>>> implement the Distribution interface.
>>>> Won't we have problems, e.g. with implementing cumulativeProbability?
>>> The idea I had was to interpolate within bins. So to compute the
>>> cdf at x you would find its bin, sum the mass (based on number of
>>> original sample points contained, like the sampling does) of the
>>> bins below its containing bin and then use the defined kernel within
>>> bin to determine how much of its own bin's mass to include.
>> Seems reasonable. But: We might want to include a user specified
>> support  just simple (endpoints of an interval)  or else the highest
>> and lowest value specifies the support which might not be a good idea.
>
> By the latter, do you mean just interpolate linearly between lowest
> and highest, or do you mean the lowest / highest actually observed
> points in the bin? The first is like using a uniform kernel in the
> bins. By "userspecified support" I guess you mean make the
> interpolation strategy pluggable somehow, right? What launched me
> into thinking about making the kernel used for sampling configurable
> was thinking about how uniform would probably be better / more
> defensible for use interpolating the cdf in some cases. Then you
> have to ask is it OK to use a different kernel for the sampling vs
> cdf computation. My instinct is to say no and keep it simple 
> allow a uniform kernel to be chosen in place of the hardcoded
> Gaussian there now and then use the configured kernel for both
> sampling and cdf computation. Even with mixed kernels, you will
> probably in most cases end up with decent fidelity between sampling
> results and the cdf; but I can imagine scenarios where Gaussian
> kernels with coarse grids could lead to funny sampling distributions
> that would not follow the linearlyinterpolated cdf very well near
> grid points.
>
> Phil
"but I can imagine scenarios where Gaussian
kernels with coarse grids could lead to funny sampling distributions
that would not follow the linearlyinterpolated cdf very well near
grid points."
Yes, precisely. Especially if trying to distribute the probability
mass on a discrete grid :).
To clearify what I ment by userspecified support:
If a user has observations 1, 3, 4, we would probably want to open up
for probability mass elsewhere than just at {1, 2, 3, 4} (2 is
interpolated). Then I mean that it might make sense that the user can
specify that that the distribution is discrete with a support of {0,
1, 2, 3, 4, 5} (2 is interpolated and 0/5 interpolated). Similar for
continuous distributions.
Of is that too ambitious?
Regarding kernels, I'm okay with only supporting uniform and Gaussian,
but we might think about it  we might come up with a clever solution
giving pluggable kernels almost for free (if we are lucky :)).
Cheers, Mikkel.
>>>>> 2) make the kernel used within bins configurable. Currently, values
>>>>> are generated (and the cdf would be computed) assuming a Gaussian
>>>>> distribution within bins. I think at least a uniform option should
>>>>> be provided.
>>>> +1, maybe it can be generalised to providing userdefined kernels.
>>> Good idea. Need to think about how to enable that.
>>>
>>> Thanks!
>>>
>>> Phil
>>>>> Thanks in advance for any feedback on this or further suggestions
>>>>> for improvement.
>>>>>
>>>>> Phil
>>>>>
>> Cheers, Mikkel.
>>
>> 
>> To unsubscribe, email: devunsubscribe@commons.apache.org
>> For additional commands, email: devhelp@commons.apache.org
>>
>>
>
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org
>
>

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
