commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Barker" <wbar...@wilshire.com>
Subject Re: [math] patch submitted for commons-math sparse matrix support
Date Wed, 10 Dec 2008 03:29:25 GMT
I've recently come up with a need in my day-job for a 
SparseMatrix/SparseVector implementation, so would would like to offer to 
help on this project.  I should still have karma for commons (my Apache id 
is billbarker@a.o for those that haven't been around since my last commons 
commit. I'm primarily a Tomcat committer in Apache-land, but have worked on 
Modeler and Daemon as well).  I have a degree in math, so I know my way 
around linear algebra.  I've also been around in commons long enough to know 
that I should ask nicely first before starting to commit ;).

I'm still in the process of reviewing the patches, so don't have any 
specific recommendations yet.  I'll re-post when I do.

"Sujit Pal" <sujit.pal@comcast.net> wrote in message 
news:1228250456.20401.36.camel@pluto.healthline.com...
> I've updated MATH-230 with a fresh patch (this time against trunk, since
> MATH_2_0 has been moved to trunk) putting back the fallback code Luc
> pointed out was used for performance.
>
> Details are in the bug:
> http://issues.apache.org/jira/browse/MATH-230
>
> There are also some test times for running with the current version vs
> the getEntry() stuff I put in, and for the current implementation. The
> performance does degrade for large matrices with getEntry(), but not by
> much for the sizes I was looking at, but since the complexity is
> O(n**2), there is going to be more degradation at larger sizes. I wasn't
> able to get the test to complete in either case in a reasonable length
> of time (~10s).
>
> size               data[][]  getEntry()  restored data[][]
> [3x2]*[2x4]          0ms         0ms           0ms
> [6x4]*[4x8])         0ms         0ms           0ms
> [12x8]*[8x16])       1ms         0ms           0ms
> [24x16]*[16x32]      2ms         2ms           2ms
> [48x32]*[32x64])     14ms        16ms          23ms
> [96x64]*[64x128]     2ms         3ms           3ms
> [192x128]*[128x256]  24ms        50ms          28ms
> [384x256]*[256x512]  770ms       813ms         768ms
>
> One thing I would like to suggest. As a business programmer, I may have
> to deal with matrices, but I would prefer not have to deal with
> implementing individual matrix methods. Which is one reason someone like
> me would want to use a library like commons-math in the first place.
> However, as I found recently, I may have to make different
> implementations of RealMatrix, which I would prefer to do by subclassing
> and overriding the getEntry() and setEntry() methods. Many algorithms
> are based on matrices (and they probably assume a JVM with a huge amount
> of memory), so it makes the code cleaner to use matrices as an
> abstraction, but I could use a Map, Lucene index, database table, etc as
> storage for my data. Would it make sense to have a generic impl of
> RealMatrix, perhaps called AbstractRealMatrixImpl where the getEntry and
> setEntry are abstract, which is meant for this sort of application?
> Obviously, this implies that there is a guarantee that getEntry and
> setEntry are used uniformly throughout the code, except for special
> cases such as RealMatrixImpl.
>
> -sujit
>
> On Tue, 2008-12-02 at 14:19 +0100, luc.maisonobe@free.fr wrote:
>> ----- "Ted Dunning" <ted.dunning@gmail.com> a écrit :
>>
>> > Is performance in LUD normally achieved by fast element access?  Or is
>> > it
>> > done by using higher order operators on a view of a row (or column)?
>>
>> This is not used for LUD but only for simpler operations (multiplication, 
>> addition, subtraction). For these operations, the algorithm is almost 
>> reduced to one operation, so a single getEntry() instead of a direct 
>> access result is really a performance bottleneck. This does not mean it 
>> is the only point to improve, but it was the easiest one.
>>
>> The more I think about it, the more I feel your proposal to change the 
>> underlying layout is the way to go. I am ready to do it if you want once 
>> I have completed a first working implementation for SVD (even if this 
>> first implementation is not fast enough). I would however be interested 
>> if you could provide a few pointers to me, as you may have noticed, 
>> linear algebra is not my main field of activity.
>>
>> What would you suggest ? How is storage handled in colt ?
>>
>> Luc
>>
>>
>> >
>> > I know the the Colt package made the argument that the latter it far
>> > preferable.  I also know that Jama didn't do this, but was very hard
>> > to
>> > extend as a result.
>> >
>> > The colt approach was to require the matrices provide row, column and
>> > submatrix view operations and that vectors and matrices support
>> > functionally
>> > oriented destructive updates.  This can be hard for users to
>> > understand so
>> > you generally have to provide more algebraic interfaces as well, but
>> > it can
>> > work wonders for performance.
>> >
>> > On Mon, Dec 1, 2008 at 2:52 PM, Luc Maisonobe <Luc.Maisonobe@free.fr>
>> > wrote:
>> >
>> > > For now, one thing  that bother me is the removal of the dedicated
>> > code
>> > > that explicitely avoided getEntry(). This was added a few months ago
>> > for
>> > > performance reason, since the previous generic code was painfully
>> > slow.
>> > > The trick with the ClassCastException allows really fast check since
>> > the
>> > > virtual machine can completely optimize it out in some cases, it is
>> > an
>> > > enhancement of what was discussed here:
>> > > http://markmail.org/message/36fgg7vx6gzd3ziu. Our discussion at
>> > that
>> > > time was that the more common case (RealMatrixImpl) should be as
>> > > efficient as possible (and Ted wants now it to be even faster by
>> > > changing the underlying storage which is a good thing). This trick
>> > is
>> > > not beautiful perfectly generic code, it is a pragmatic way to be
>> > fast
>> > > in a common case and removing it will most probably induce poorer
>> > > performances.
>> > >
>> >
>> >
>> >
>> > -- 
>> > Ted Dunning, CTO
>> > DeepDyve
>> > 4600 Bohannon Drive, Suite 220
>> > Menlo Park, CA 94025
>> > www.deepdyve.com
>> > 650-324-0110, ext. 738
>> > 858-414-0013 (m)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message