commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From luc.maison...@free.fr
Subject Re: [math] Questions about the linear package
Date Wed, 14 Oct 2009 10:32:20 GMT

----- "Ted Dunning" <ted.dunning@gmail.com> a écrit :

> I would like to add my voice as a Mahout committer.  We would LOVE to
> use
> commons math in Mahout, but these and a few other issues prevent it.
> 
> There was word some time ago about integrating a high performance
> linear
> package such as MTJ into math.  Is that stalled?

If anybody is willing to do it, it is fine. I don't know if Sam is still around and willing
to help.

Luc

> 
> On Tue, Oct 13, 2009 at 10:50 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
> 
> > Greetings, commons-math!
> >
> >  I've been looking at a variety of apache/bsd-licensed linear
> libraries for
> > use in massively parallel machine-learning applications I've been
> working
> > on
> > (I am housing my own open-source library at
> > http://decomposer.googlecode.com,
> > and am looking at integrating with/using/contributing to Apache
> Mahout),
> > and
> > I'm wondering a little about the linear API there is here in
> commons-math:
> >
> >  * also for RealVector - No iterator methods?  So if the
> implementation is
> > sparse, there's no way to just iterate over the non-zero entries? 
> What's
> > worse, you can't even subclass OpenMapVector and expose the iterator
> on the
> > OpenIntToDoubleHashMap inner object, because it's private. :\
> >
> >  * for RealVector - what's with the million-different methods
> mapXXX(),
> > mapXXXtoSelf()?  Why not just map(UnaryFunction()), and
> > mapToSelf(UnaryFunction()), where UnaryFunction defines the single
> method
> > double apply(double d); ?  Any user who wishes to implement
> RealVector (to
> > say, make a more efficient specialized SparseVector) has to go
> through the
> > pain of writing up a million methods dealing with these (and even
> if
> > copy/paste gets most of this,  it still leads to some horribly huge
> .java
> > files filled with junk that does not appear to be used).  There does
> not
> > even appear to be an AbstractRealVector which implements all of
> these for
> > you (by using the above-mentioned iterator() ).
> >
> >  * while we're at it, if there is map(), why not also double
> > RealVector.collect(Collector()), where Collector defines void
> collect(int
> > index, double value); and double result(); - this can be used for
> generic
> > inner products and kernels (and can allow for consolidating all of
> the
> > L1Norm(), norm(), and LInfNorm() methods into this same method,
> passing in
> > different L1NormCollector() etc... instances).
> >
> >  * why all the methods which are overloaded to take either
> RealVector or
> > double[] (getDistance, dotProduct, add, etc...) - is there really
> that much
> > overhead in just implementing dotProduct(double[] d)  as just
> > dotProduct(new
> > ArrayRealVector(d, false)); - no copy is done, nothing is done but
> one
> > object creation...
> >
> >  * SparseVector is just a marker interface?  Does it serve any
> purpose?
> >
> > I guess I could ask similar questions on the Matrix interfaces, but
> maybe
> > those will probably be cleared up by understanding the philosophy
> behind
> > the
> > Vector interfaces.
> >
> > I'd love to use commons-math for parts of my projects in which the
> entire
> > data sets can live in memory (often part of the computation falls
> into this
> > category, even if it's not the most meaty part, it's big enough that
> I'll
> > kill my performance if I am stuck writing my own subroutines for
> eigen
> > computation, etc for many moderately small matrices), but converting
> two
> > and
> > from the commons-math linear interfaces seem a bit unweildy.  Maybe
> it
> > would
> > be easier if I could understand why these are the way they are.
> >
> > I'm happy to contribute patches consolidating interfaces and/or
> extending
> > functionality (you seem to be missing a compact int/double pair
> > implementation of sparse vectors, for example, which are a
> fantasticly
> > performant format if they're immutable and only being used for dot
> products
> > and adding them to dense vectors), if it would be of help (I'm
> tracking my
> > attempts at this over on my GitHub clone of trunk:
> > http://github.com/jakemannix/commons-math ).
> >
> >  -jake mannix
> >  Principal Software Engineer
> >  Search and Recommender Systems
> >  LinkedIn.com
> >
> 
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message