OK, I have created a patch...I tried to follow the instructions to file a
bug on bugzilla but i can't seem to find the right place to file a new bug
to either commons or commons math.
I wonder if someone could help me out.
/b
On 9/29/07, Phil Steitz <phil.steitz@gmail.com> wrote:
>
> On 9/22/07, Bradford Cross <bradford.n.cross@gmail.com> wrote:
> > Greetings!
> >
> > Recently I stumbled into the Commons math project; nice design, good
> > abstractions, "smart updates" and even unit tests! :)
> >
> Thanks!
>
> > the Smart updates are a key feature for event stream processing / time
> > series simulation. The only piece that is missing from a time series
> > analysis and simulation perspective is the ability to supply a lag that
> > defines a fixed sample size and perform rolling calculations.
> >
>
> That functionality actually already exists in the
> DescriptiveStatistics class. You can set a "window size" for rolling
> computations of univariate statistics using the concrete
> implementation of this class,
> o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl. See
> http://commons.apache.org/math/userguide/stat.html
>
> > I was very happy to see this as an item on the wish list.
>
> The wishlist item is not as clear as it could be. Sorry about that.
> In addition to the computations in DescriptiveStatistics that require
> that you maintain all of the values in the current window in memory,
> we also support "storeless" computation of statistics than can be
> computed in one pass through the data. This allows very large data
> streams to be handled with fixed storage overhead. I think that what
> the wishlist item refers to is something in between  ways to support
> the window concept without storing all of the data. Strictly
> speaking, this is impossible, but doing things like sampling from the
> streams, periodically resetting or maintaining arrays of storeless
> stats with different offsets would in theory be possible.
> >
> > A ThoughtWorks colleague (Yaxin Wang) and I are prototyping a java time
> > series simulation engine and we are considering the commons math as the
> base
> > of our numerical libraries. In order to do this we need to complete the
> > rolling calculations, so here is our first spike (spike means prototype
> that
> > can be thrown away / not a real patch.) We thought we would start with
> an
> > easy case; mean, which uses sum.
> >
> > We have already combined the rolling calculations with the smart update
> > algorithms before in the numerical libraries for our previous time
> series
> > simulation engine. As you have mentioned in the wish list notes, our
> past
> > experience is that some of the algorithms can not avoid using queues for
> > rolling updates case. Obviously it is something pretty fundamental to
> the
> > design and requires a bit of work across a lot of places to do this for
> all
> > the statistics (at least starting with summary statistics.)
> >
> > Please give feedback on the design, any issues with performance (better
> data
> > structure than the queue we used), etc!
> >
> > If the community is OK with this initial spike, then we can start
> submitting
> > patches. :)
> >
>
> Thanks for the contribution! There are a few problems with
> incorporating the code as is, though. First it uses generics and the
> concurrent package, which requires JDK 1.5 and our current minimum JDK
> level is 1.3. That could probably be eliminated fairly easily,
> though. The second is really whether or not the queue implementation
> is going to improve performance over the ResizeableDoubleArray store
> that DescriptiveStatisticsImpl uses now. If you think so and can
> demonstrate with benchmarks, we can talk about swapping out that
> implementation. Otherwise, its probably better to use
> ResizeableDoubleArray.
>
> I am +1 on adding a RollingStatistic abstract base class (would prefer
> that name to "Statistic" since it is specialized) like you have
> defined and rolling versions of the individual statistics. This would
> be a convenience over the current setup and provide a more intuitive
> way to access rolling stats than to use DescriptiveStatisticsImpl as a
> container. Currently this is only the only way to do it. So if you
> can refactor to either use ResizableDoubleArray as the backing store
> (look at DescriptiveStatisticsImpl.apply  the convenience classes
> could just use that pattern) or otherwise eliminate the JDK 1.5
> dependency, I would support adding the rolling stats. If I understand
> correctly the idea of what you mean by Sum, and Mean (using
> constructor arguments to determine whether or not statistic is
> rolling), I would prefer to leave the existing statistics in
> commonsmath as is and introduce Rolling versions as separate classes.
>
> One more thing. It is very important that any contributions that you
> make can be made in accordance with the Apache Contributor's License
> Agreement. Have a look here:
> http://www.apache.org/licenses/#clas
> and make sure you can agree to those terms. Then you can start
> submitting patches with attachements to Jira tickets.
>
> Thanks!
>
> Phil
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org
>
>
