On 9/29/07, Phil Steitz <phil.steitz@gmail.com> wrote:
>
> On 9/22/07, Bradford Cross <bradford.n.cross@gmail.com>
>
> > the Smart updates are a key feature for event stream processing / time
> > series simulation. The only piece that is missing from a time series
> > analysis and simulation perspective is the ability to supply a lag that
> > defines a fixed sample size and perform rolling calculations.
> >
>
> That functionality actually already exists in the
> DescriptiveStatistics class. You can set a "window size" for rolling
> computations of univariate statistics using the concrete
> implementation of this class,
> o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl. See
> http://commons.apache.org/math/userguide/stat.html
cool  i did not see this yet.
> >
> > If the community is OK with this initial spike, then we can start
> submitting
> > patches. :)
> >
>
> Thanks for the contribution! There are a few problems with
> incorporating the code as is, though. First it uses generics and the
> concurrent package, which requires JDK 1.5 and our current minimum JDK
> level is 1.3. That could probably be eliminated fairly easily,
> though. The second is really whether or not the queue implementation
> is going to improve performance over the ResizeableDoubleArray store
> that DescriptiveStatisticsImpl uses now. If you think so and can
> demonstrate with benchmarks, we can talk about swapping out that
> implementation. Otherwise, its probably better to use
> ResizeableDoubleArray.
Yes, this is just a "spike"  a proof of concept. :) Today I setup a
benchmark test and swapped in lots of different collections. The fastest
java queue I found is the ArrayDeque from java6. Interestingly, the
calculations are about twice as fast using this queue compared with some of
the other queue implementations in the java collections for a run of about
10K calls to calculate(). Nevertheless, the ResizableDoubleArray seems to
be a bit faster. I will formalize my benchmarking with a bit more rigor and
publish the results on this thread.
I am +1 on adding a RollingStatistic abstract base class (would prefer
> that name to "Statistic" since it is specialized) like you have
> defined and rolling versions of the individual statistics. This would
> be a convenience over the current setup and provide a more intuitive
> way to access rolling stats than to use DescriptiveStatisticsImpl as a
> container. Currently this is only the only way to do it. So if you
> can refactor to either use ResizableDoubleArray as the backing store
> (look at DescriptiveStatisticsImpl.apply  the convenience classes
> could just use that pattern) or otherwise eliminate the JDK 1.5
> dependency, I would support adding the rolling stats. If I understand
> correctly the idea of what you mean by Sum, and Mean (using
> constructor arguments to determine whether or not statistic is
> rolling), I would prefer to leave the existing statistics in
> commonsmath as is and introduce Rolling versions as separate classes.
Sounds good  I will start working on the RollingStatistics. As for the
convience pattern I used in Mean/Sum (using
constructor arguments to determine whether or not statistic is rolling)  it
is easy to do the refactorings later after the rolling statistics are
added. We can just leave the current statistics as is and wait to see if we
find some valuable reason to do it.
One more thing. It is very important that any contributions that you
> make can be made in accordance with the Apache Contributor's License
> Agreement. Have a look here:
> http://www.apache.org/licenses/#clas
> and make sure you can agree to those terms.
Yep, no problems. :)
Then you can start
> submitting patches with attachements to Jira tickets.
Sounds good.
