commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <>
Subject Re: [math] patch submitted for commons-math sparse matrix support
Date Wed, 10 Dec 2008 19:39:23 GMT
Bill Barker a écrit :
> I've recently come up with a need in my day-job for a 
> SparseMatrix/SparseVector implementation, so would would like to offer to 
> help on this project.  I should still have karma for commons (my Apache id 
> is billbarker@a.o for those that haven't been around since my last commons 
> commit. I'm primarily a Tomcat committer in Apache-land, but have worked on 
> Modeler and Daemon as well).  I have a degree in math, so I know my way 
> around linear algebra.  I've also been around in commons long enough to know 
> that I should ask nicely first before starting to commit ;).
> I'm still in the process of reviewing the patches, so don't have any 
> specific recommendations yet.  I'll re-post when I do.

I have reviewed the patch on my side. It is mainly OK for me. I would
have prefered a separate class for the Map as I said in the comment on
JIRA, but this is not mandatory.

The patch is huge, but the largest part is dues to tests, many of them
being borrowed from the existing RealMatrixTest suite, so it is OK. The
new classes are already provided with appropriate Apache header by the
authors, so I think this could be considered sufficient for IP
clearance. Does anybody think we should ask for more ?

Bill, what do you think about the patch ?


> "Sujit Pal" <> wrote in message 
>> I've updated MATH-230 with a fresh patch (this time against trunk, since
>> MATH_2_0 has been moved to trunk) putting back the fallback code Luc
>> pointed out was used for performance.
>> Details are in the bug:
>> There are also some test times for running with the current version vs
>> the getEntry() stuff I put in, and for the current implementation. The
>> performance does degrade for large matrices with getEntry(), but not by
>> much for the sizes I was looking at, but since the complexity is
>> O(n**2), there is going to be more degradation at larger sizes. I wasn't
>> able to get the test to complete in either case in a reasonable length
>> of time (~10s).
>> size               data[][]  getEntry()  restored data[][]
>> [3x2]*[2x4]          0ms         0ms           0ms
>> [6x4]*[4x8])         0ms         0ms           0ms
>> [12x8]*[8x16])       1ms         0ms           0ms
>> [24x16]*[16x32]      2ms         2ms           2ms
>> [48x32]*[32x64])     14ms        16ms          23ms
>> [96x64]*[64x128]     2ms         3ms           3ms
>> [192x128]*[128x256]  24ms        50ms          28ms
>> [384x256]*[256x512]  770ms       813ms         768ms
>> One thing I would like to suggest. As a business programmer, I may have
>> to deal with matrices, but I would prefer not have to deal with
>> implementing individual matrix methods. Which is one reason someone like
>> me would want to use a library like commons-math in the first place.
>> However, as I found recently, I may have to make different
>> implementations of RealMatrix, which I would prefer to do by subclassing
>> and overriding the getEntry() and setEntry() methods. Many algorithms
>> are based on matrices (and they probably assume a JVM with a huge amount
>> of memory), so it makes the code cleaner to use matrices as an
>> abstraction, but I could use a Map, Lucene index, database table, etc as
>> storage for my data. Would it make sense to have a generic impl of
>> RealMatrix, perhaps called AbstractRealMatrixImpl where the getEntry and
>> setEntry are abstract, which is meant for this sort of application?
>> Obviously, this implies that there is a guarantee that getEntry and
>> setEntry are used uniformly throughout the code, except for special
>> cases such as RealMatrixImpl.
>> -sujit
>> On Tue, 2008-12-02 at 14:19 +0100, wrote:
>>> ----- "Ted Dunning" <> a écrit :
>>>> Is performance in LUD normally achieved by fast element access?  Or is
>>>> it
>>>> done by using higher order operators on a view of a row (or column)?
>>> This is not used for LUD but only for simpler operations (multiplication, 
>>> addition, subtraction). For these operations, the algorithm is almost 
>>> reduced to one operation, so a single getEntry() instead of a direct 
>>> access result is really a performance bottleneck. This does not mean it 
>>> is the only point to improve, but it was the easiest one.
>>> The more I think about it, the more I feel your proposal to change the 
>>> underlying layout is the way to go. I am ready to do it if you want once 
>>> I have completed a first working implementation for SVD (even if this 
>>> first implementation is not fast enough). I would however be interested 
>>> if you could provide a few pointers to me, as you may have noticed, 
>>> linear algebra is not my main field of activity.
>>> What would you suggest ? How is storage handled in colt ?
>>> Luc
>>>> I know the the Colt package made the argument that the latter it far
>>>> preferable.  I also know that Jama didn't do this, but was very hard
>>>> to
>>>> extend as a result.
>>>> The colt approach was to require the matrices provide row, column and
>>>> submatrix view operations and that vectors and matrices support
>>>> functionally
>>>> oriented destructive updates.  This can be hard for users to
>>>> understand so
>>>> you generally have to provide more algebraic interfaces as well, but
>>>> it can
>>>> work wonders for performance.
>>>> On Mon, Dec 1, 2008 at 2:52 PM, Luc Maisonobe <>
>>>> wrote:
>>>>> For now, one thing  that bother me is the removal of the dedicated
>>>> code
>>>>> that explicitely avoided getEntry(). This was added a few months ago
>>>> for
>>>>> performance reason, since the previous generic code was painfully
>>>> slow.
>>>>> The trick with the ClassCastException allows really fast check since
>>>> the
>>>>> virtual machine can completely optimize it out in some cases, it is
>>>> an
>>>>> enhancement of what was discussed here:
>>>>> Our discussion at
>>>> that
>>>>> time was that the more common case (RealMatrixImpl) should be as
>>>>> efficient as possible (and Ted wants now it to be even faster by
>>>>> changing the underlying storage which is a good thing). This trick
>>>> is
>>>>> not beautiful perfectly generic code, it is a pragmatic way to be
>>>> fast
>>>>> in a common case and removing it will most probably induce poorer
>>>>> performances.
>>>> -- 
>>>> Ted Dunning, CTO
>>>> DeepDyve
>>>> 4600 Bohannon Drive, Suite 220
>>>> Menlo Park, CA 94025
>>>> 650-324-0110, ext. 738
>>>> 858-414-0013 (m)
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message