commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujit Pal <sujit....@comcast.net>
Subject Re: [math] patch submitted for commons-math sparse matrix support
Date Tue, 02 Dec 2008 20:40:56 GMT
I've updated MATH-230 with a fresh patch (this time against trunk, since
MATH_2_0 has been moved to trunk) putting back the fallback code Luc
pointed out was used for performance.

Details are in the bug:
http://issues.apache.org/jira/browse/MATH-230

There are also some test times for running with the current version vs
the getEntry() stuff I put in, and for the current implementation. The
performance does degrade for large matrices with getEntry(), but not by
much for the sizes I was looking at, but since the complexity is
O(n**2), there is going to be more degradation at larger sizes. I wasn't
able to get the test to complete in either case in a reasonable length
of time (~10s).

size               data[][]  getEntry()  restored data[][] 
[3x2]*[2x4]          0ms         0ms           0ms
[6x4]*[4x8])         0ms         0ms           0ms
[12x8]*[8x16])       1ms         0ms           0ms
[24x16]*[16x32]      2ms         2ms           2ms
[48x32]*[32x64])     14ms        16ms          23ms
[96x64]*[64x128]     2ms         3ms           3ms
[192x128]*[128x256]  24ms        50ms          28ms
[384x256]*[256x512]  770ms       813ms         768ms

One thing I would like to suggest. As a business programmer, I may have
to deal with matrices, but I would prefer not have to deal with
implementing individual matrix methods. Which is one reason someone like
me would want to use a library like commons-math in the first place.
However, as I found recently, I may have to make different
implementations of RealMatrix, which I would prefer to do by subclassing
and overriding the getEntry() and setEntry() methods. Many algorithms
are based on matrices (and they probably assume a JVM with a huge amount
of memory), so it makes the code cleaner to use matrices as an
abstraction, but I could use a Map, Lucene index, database table, etc as
storage for my data. Would it make sense to have a generic impl of
RealMatrix, perhaps called AbstractRealMatrixImpl where the getEntry and
setEntry are abstract, which is meant for this sort of application?
Obviously, this implies that there is a guarantee that getEntry and
setEntry are used uniformly throughout the code, except for special
cases such as RealMatrixImpl.

-sujit

On Tue, 2008-12-02 at 14:19 +0100, luc.maisonobe@free.fr wrote:
> ----- "Ted Dunning" <ted.dunning@gmail.com> a écrit :
> 
> > Is performance in LUD normally achieved by fast element access?  Or is
> > it
> > done by using higher order operators on a view of a row (or column)?
> 
> This is not used for LUD but only for simpler operations (multiplication, addition, subtraction).
For these operations, the algorithm is almost reduced to one operation, so a single getEntry()
instead of a direct access result is really a performance bottleneck. This does not mean it
is the only point to improve, but it was the easiest one.
> 
> The more I think about it, the more I feel your proposal to change the underlying layout
is the way to go. I am ready to do it if you want once I have completed a first working implementation
for SVD (even if this first implementation is not fast enough). I would however be interested
if you could provide a few pointers to me, as you may have noticed, linear algebra is not
my main field of activity.
> 
> What would you suggest ? How is storage handled in colt ?
> 
> Luc
>  
> 
> > 
> > I know the the Colt package made the argument that the latter it far
> > preferable.  I also know that Jama didn't do this, but was very hard
> > to
> > extend as a result.
> > 
> > The colt approach was to require the matrices provide row, column and
> > submatrix view operations and that vectors and matrices support
> > functionally
> > oriented destructive updates.  This can be hard for users to
> > understand so
> > you generally have to provide more algebraic interfaces as well, but
> > it can
> > work wonders for performance.
> > 
> > On Mon, Dec 1, 2008 at 2:52 PM, Luc Maisonobe <Luc.Maisonobe@free.fr>
> > wrote:
> > 
> > > For now, one thing  that bother me is the removal of the dedicated
> > code
> > > that explicitely avoided getEntry(). This was added a few months ago
> > for
> > > performance reason, since the previous generic code was painfully
> > slow.
> > > The trick with the ClassCastException allows really fast check since
> > the
> > > virtual machine can completely optimize it out in some cases, it is
> > an
> > > enhancement of what was discussed here:
> > > http://markmail.org/message/36fgg7vx6gzd3ziu. Our discussion at
> > that
> > > time was that the more common case (RealMatrixImpl) should be as
> > > efficient as possible (and Ted wants now it to be even faster by
> > > changing the underlying storage which is a good thing). This trick
> > is
> > > not beautiful perfectly generic code, it is a pragmatic way to be
> > fast
> > > in a common case and removing it will most probably induce poorer
> > > performances.
> > >
> > 
> > 
> > 
> > -- 
> > Ted Dunning, CTO
> > DeepDyve
> > 4600 Bohannon Drive, Suite 220
> > Menlo Park, CA 94025
> > www.deepdyve.com
> > 650-324-0110, ext. 738
> > 858-414-0013 (m)
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message