commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz" <p...@steitz.com>
Subject Re: [math] Re: "Straw man" release plan
Date Tue, 03 Feb 2004 12:13:14 GMT
Piotr Kochan'ski wrote:
> Mark R. Diggory wrote:

> 
> Exactely, but the point is that we have to preserve original/bootstrap
> values and EmpiricalDistribution is not storing them - internally it keeps
> data
> in the array of bins. 

My thought was that we could do some things (e.g. estimate confidence 
intervals) without storing the boostrap samples or even the full set of 
bootstrap statistics.

As I understand this was the aim - we don't have to
> keep the whole data set in order to get important information about the
> empirical distribution.  If a data set is huge this is a true gain.

Yes.  This is why EmpiricalDistribution exists.
> 
> If, on the other hand, I want to keep the whole data set then I can easily
> use
> other tools to calculate any statistics I want so I don't need to use 
> EmpiricalDistribution.

Yes. Even for the bootstrap percentiles, if the number of bootstrap 
samples is small enough to store the stats in memory, we could get the 
percentiles directly by applying Percentile to the stored values.
> 
> Documentation for EmpiricalDistribution gives two example applications
> of this interface - preparing data for drawing a histogram and provide
> methods to draw random numbers from such a distribution. I am
> wondering if making EmpiricalDistribution responsible for other tasks
> like handling bootstrap samples or even doing bootstrap would not
> make it to complicated to use.

That is why I asked the question.  What is going on is that to meet the 
needs of the second use case above, something like a variable kernel 
density estimator was developed.  This has many uses beyond generating 
random data.  Among these might be supporting inference based on large 
numbers of bootstrap samples.  Given that the implementation now requires 
two passes through the data, there is probably not much value to this 
approach using the current implementation.  What I wanted to verify is 
that the interface is adequate to support this kind of inference (and the 
other kinds of things that it might be used for).  I never intended to 
imply that EmpiricalDistribution would manipulate or generate bootstrap 
samples itself.

Phil

> 
> Piotr
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message