commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <>
Subject Re: [math] patch submitted for commons-math sparse matrix support
Date Wed, 03 Dec 2008 21:53:19 GMT
Sujit Pal a écrit :
> Thanks, added it to JIRA:
> Also, please let me know if you want me to go ahead and submit a patch
> for this.

No, this is not needed, thanks.

> I also noticed that MATH-230 is open/unassigned. The patch is already
> in, so all that remains is to review it before deciding to check in or
> not. Not sure if I need to do anything here, if yes, please let me know.

I'll review it. I am already late but in the last few days, many
interesting issues have been discussed about the linear algebra package.
The current tasks for me are (in random order, counting only linear
  - completing singular value decomposition implementation
  - reworking the decomposition API for all algorithms
  - thinking about changing double[][] to something faster
  - reviewing your patch for MATH-230
  - extracting the abstract class for MATH-231
  - adding a few new methods for row/column handling someone asked for
  - thinking about views/selections
  - thinking about field genericity
  - benchmarking


> -sujit
> On Tue, 2008-12-02 at 23:02 +0100, Luc Maisonobe wrote:
>> Sujit Pal a écrit :
>>> I've updated MATH-230 with a fresh patch (this time against trunk, since
>>> MATH_2_0 has been moved to trunk) putting back the fallback code Luc
>>> pointed out was used for performance.
>> Thanks.
>>> Details are in the bug:
>>> There are also some test times for running with the current version vs
>>> the getEntry() stuff I put in, and for the current implementation. The
>>> performance does degrade for large matrices with getEntry(), but not by
>>> much for the sizes I was looking at, but since the complexity is
>>> O(n**2), there is going to be more degradation at larger sizes. I wasn't
>>> able to get the test to complete in either case in a reasonable length
>>> of time (~10s).
>>> size               data[][]  getEntry()  restored data[][] 
>>> [3x2]*[2x4]          0ms         0ms           0ms
>>> [6x4]*[4x8])         0ms         0ms           0ms
>>> [12x8]*[8x16])       1ms         0ms           0ms
>>> [24x16]*[16x32]      2ms         2ms           2ms
>>> [48x32]*[32x64])     14ms        16ms          23ms
>>> [96x64]*[64x128]     2ms         3ms           3ms
>>> [192x128]*[128x256]  24ms        50ms          28ms
>>> [384x256]*[256x512]  770ms       813ms         768ms
>> These differences are very small. The benchmark I did some months ago
>> were rather ratios about 2 or 3 (but there where also other problems
>> like spurious bounds checkings).
>>> One thing I would like to suggest. As a business programmer, I may have
>>> to deal with matrices, but I would prefer not have to deal with
>>> implementing individual matrix methods. Which is one reason someone like
>>> me would want to use a library like commons-math in the first place.
>>> However, as I found recently, I may have to make different
>>> implementations of RealMatrix, which I would prefer to do by subclassing
>>> and overriding the getEntry() and setEntry() methods. Many algorithms
>>> are based on matrices (and they probably assume a JVM with a huge amount
>>> of memory), so it makes the code cleaner to use matrices as an
>>> abstraction, but I could use a Map, Lucene index, database table, etc as
>>> storage for my data. Would it make sense to have a generic impl of
>>> RealMatrix, perhaps called AbstractRealMatrixImpl where the getEntry and
>>> setEntry are abstract, which is meant for this sort of application?
>>> Obviously, this implies that there is a guarantee that getEntry and
>>> setEntry are used uniformly throughout the code, except for special
>>> cases such as RealMatrixImpl.
>> This does make sense to me as we seem to have several implementations in
>> the near future. This was probably overkill with our single
>> implementation up to now. It is also quite easy to do with modern
>> development environments that allow refactoring, interface extraction
>> and so on ...
>> Could you please open a dedicated JIRA issue for this specific point so
>> we don't forget it ?
>> thanks,
>> Luc
>>> -sujit
>>> On Tue, 2008-12-02 at 14:19 +0100, wrote:
>>>> ----- "Ted Dunning" <> a écrit :
>>>>> Is performance in LUD normally achieved by fast element access?  Or is
>>>>> it
>>>>> done by using higher order operators on a view of a row (or column)?
>>>> This is not used for LUD but only for simpler operations (multiplication,
addition, subtraction). For these operations, the algorithm is almost reduced to one operation,
so a single getEntry() instead of a direct access result is really a performance bottleneck.
This does not mean it is the only point to improve, but it was the easiest one.
>>>> The more I think about it, the more I feel your proposal to change the underlying
layout is the way to go. I am ready to do it if you want once I have completed a first working
implementation for SVD (even if this first implementation is not fast enough). I would however
be interested if you could provide a few pointers to me, as you may have noticed, linear algebra
is not my main field of activity.
>>>> What would you suggest ? How is storage handled in colt ?
>>>> Luc
>>>>> I know the the Colt package made the argument that the latter it far
>>>>> preferable.  I also know that Jama didn't do this, but was very hard
>>>>> to
>>>>> extend as a result.
>>>>> The colt approach was to require the matrices provide row, column and
>>>>> submatrix view operations and that vectors and matrices support
>>>>> functionally
>>>>> oriented destructive updates.  This can be hard for users to
>>>>> understand so
>>>>> you generally have to provide more algebraic interfaces as well, but
>>>>> it can
>>>>> work wonders for performance.
>>>>> On Mon, Dec 1, 2008 at 2:52 PM, Luc Maisonobe <>
>>>>> wrote:
>>>>>> For now, one thing  that bother me is the removal of the dedicated
>>>>> code
>>>>>> that explicitely avoided getEntry(). This was added a few months
>>>>> for
>>>>>> performance reason, since the previous generic code was painfully
>>>>> slow.
>>>>>> The trick with the ClassCastException allows really fast check since
>>>>> the
>>>>>> virtual machine can completely optimize it out in some cases, it
>>>>> an
>>>>>> enhancement of what was discussed here:
>>>>>> Our discussion at
>>>>> that
>>>>>> time was that the more common case (RealMatrixImpl) should be as
>>>>>> efficient as possible (and Ted wants now it to be even faster by
>>>>>> changing the underlying storage which is a good thing). This trick
>>>>> is
>>>>>> not beautiful perfectly generic code, it is a pragmatic way to be
>>>>> fast
>>>>>> in a common case and removing it will most probably induce poorer
>>>>>> performances.
>>>>> -- 
>>>>> Ted Dunning, CTO
>>>>> DeepDyve
>>>>> 4600 Bohannon Drive, Suite 220
>>>>> Menlo Park, CA 94025
>>>>> 650-324-0110, ext. 738
>>>>> 858-414-0013 (m)
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> For additional commands, e-mail:
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message