commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] Re: "Straw man" release plan
Date Tue, 03 Feb 2004 19:29:57 GMT

--- Piotr Kochański <> wrote:

> > My thought was that we could do some things (e.g. estimate confidence 
> > intervals) without storing the boostrap samples or even the full set of
> > bootstrap statistics.
> This is not a problem at all. When we initialize EmpiricalDistribution
> using load(...) method, we can calculate what we want - we have
> data set at that moment. 
> The problem I see is that we have to a priori specify for which
> statistics
> (bootstrap) confidence interval or standard error would be calculated. 
> We should not make that decision for the user, so some configuration of 
> EmpiricalDistribution object would be necessary, e.g.
> load(double[][], UnivariateStatistics[]) 
> then all the interesting calculation would be done for provided 
> UnivariateStatistics. The default choice could be just SummaryStatistics:
> load(double[][]){
>    statisticsToBeBootstrapped[] = All SummaryStatistics
> }
> If bootstrap samples are not provided, e.g. user uses other
> load function, we can provide confidence intervals based on the
> normal distribution assumption (for those statistics, for which
> it can be calculated).
> In fact we could leave the choice which summary statistics should
> be calculated to the user at all (e.g. for performance reason - someone
> would never be interested in calculating some statistics, but it is done
> anyway, which slows down initialization of the object).
> load(String, UnivariateStatistics[]) etc.
> Then present getSampleStats() method should return
> an object which enables access to calculated statistics and/or
> the confidence intervals for them.

Ah, now I understand what you have been trying to communicate and I agree
that adding all of this functionality to EmpiricalDistribution is not a
good idea.  I was only considering the simple use case modelling the
sampling distribution of a single, known statistic.  The more general case
in which the boostrap samples are leveraged for inferences about multiple
statistics will require more complex machinery.  I suggest that we take
this up again post 1.0.  For now, I don't think it makes sense to
significantly modify EmpiricalDistribution (though given the confusion, it
might be better to change the name :-)


Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message