commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [math] autocorr
Date Mon, 27 Sep 2010 17:39:45 GMT
Commons Math and Mahout are independent Apache projects with very different
goals and history.  Math has a
much broader goal for general math support while Mahout has a very focused
goal of building scalable data mining capabilities
quickly.  The fact that Mahout doesn't use math is unfortunate, but is
related to the difference in time scales related to the
goals.

Mahout's primary math support is inherited from Colt, but we are actively
deleting capabilities from Colt that we don't think
will contribute to scalable data mining goal because if we are going to use
any capability from Colt, we need to spend significant
effort to build tests for the code and we don't want to carry around a bunch
of code that isn't useful.

Specifically, while Mahout doesn't have Cholesky decomposition, it does have
QR decomposition which is generally just about as
useful.  We haven't yet ported LU decomposition because its utility for very
large systems which are commonly sparse is dubious.

On Mon, Sep 27, 2010 at 10:24 AM, video axescon <video@axescon.com> wrote:

> Thank you for clarification. I have to think a little of what to do now.
>
> The thing's that you seem to cherry pick components into both commons-math
> and Mahout, instead of bulk porting. For instance, I found
> autoCorrelation(...) in Descriptive in Mahout, but not in commons-math. At
> the same time, there's no cholesky decomposition in Mahout, and it's in
> commons-math. This is a bit frustrating to me.
>
> On Mon, Sep 27, 2010 at 12:29 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > In general, commons math *is* the better choice for general mathematical
> > computing.  Their mission is to provide a general mathematical substrate.
> >
> > Apache Mahout's mission is to provide scalable data mining.  Part of that
> > requires basic math which we took from Colt rather than from commons math
> > due to the compatibility constraints that commons math has.
> >
> > So, if implementing autocorr on top of Commons Math is good for you, that
> > sounds like an excellent option (it is just a dot product with an offset,
> > after all).
> >
> > IF that starts to require something that Commons Math can't easily
> provide,
> > Apache Mahout's math library (which is a separate jar, btw) may be better
> > since we are a bit more agile.   If your time series work starts to
> involve
> > serious scaling pains, then Mahout may be a good substrate from that
> > standpoint as well.
> >
> > On Mon, Sep 27, 2010 at 8:15 AM, video axescon <video@axescon.com>
> wrote:
> >
> > > Hello
> > >
> > > I'm a little confused now. I want to work on time series analysis,
> stuff
> > > like GARCH or VAR. Are you suggesting that Mahout can be the proper
> home
> > > for
> > > time series code? I guess it doesn't matter which library to start with
> > as
> > > long as it has good basic stats, optimization and matrix code in it to
> > > start
> > > with. Commons math seemed to be more logical choice to me.
> > >
> > > cheers
> > >
> > >
> > > On Mon, Sep 27, 2010 at 11:04 AM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > >
> > > > Commons math has a strict backwards compatibility constraint.
> > > >
> > > > Apache Mahout does not.
> > > >
> > > > For fixed lag, it should only require a few lines of code in Mahout
> and
> > > you
> > > > should be up and running in a week or so on the trunk version.
> > > >
> > > > On Mon, Sep 27, 2010 at 7:47 AM, video axescon <video@axescon.com>
> > > wrote:
> > > >
> > > > > If you have a need for autocorrelation and would like to work with
> us
> > > to
> > > > > > rehabilitate and port the associated Colt code, I would
> > > > > > be happy to help by advising about our nascent conventions about
> > how
> > > we
> > > > > are
> > > > > > organizing our code and what sort of testing and
> > > > > > porting is needed.
> > > > > >
> > > > > >
> > > > > I'm contemplating it. I'm a little bit concerned about the
> > bureaucracy
> > > in
> > > > > this project, it could be easier for me to simply implement it for
> > > > myself.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message