commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujit Pal <sujit....@comcast.net>
Subject Re: [math] patch submitted for commons-math sparse matrix support
Date Wed, 03 Dec 2008 21:24:12 GMT
Thanks, added it to JIRA:
https://issues.apache.org/jira/browse/MATH-231

Also, please let me know if you want me to go ahead and submit a patch
for this.

I also noticed that MATH-230 is open/unassigned. The patch is already
in, so all that remains is to review it before deciding to check in or
not. Not sure if I need to do anything here, if yes, please let me know.

-sujit

On Tue, 2008-12-02 at 23:02 +0100, Luc Maisonobe wrote:
> Sujit Pal a écrit :
> > I've updated MATH-230 with a fresh patch (this time against trunk, since
> > MATH_2_0 has been moved to trunk) putting back the fallback code Luc
> > pointed out was used for performance.
> 
> Thanks.
> 
> > 
> > Details are in the bug:
> > http://issues.apache.org/jira/browse/MATH-230
> > 
> > There are also some test times for running with the current version vs
> > the getEntry() stuff I put in, and for the current implementation. The
> > performance does degrade for large matrices with getEntry(), but not by
> > much for the sizes I was looking at, but since the complexity is
> > O(n**2), there is going to be more degradation at larger sizes. I wasn't
> > able to get the test to complete in either case in a reasonable length
> > of time (~10s).
> > 
> > size               data[][]  getEntry()  restored data[][] 
> > [3x2]*[2x4]          0ms         0ms           0ms
> > [6x4]*[4x8])         0ms         0ms           0ms
> > [12x8]*[8x16])       1ms         0ms           0ms
> > [24x16]*[16x32]      2ms         2ms           2ms
> > [48x32]*[32x64])     14ms        16ms          23ms
> > [96x64]*[64x128]     2ms         3ms           3ms
> > [192x128]*[128x256]  24ms        50ms          28ms
> > [384x256]*[256x512]  770ms       813ms         768ms
> 
> These differences are very small. The benchmark I did some months ago
> were rather ratios about 2 or 3 (but there where also other problems
> like spurious bounds checkings).
> 
> > 
> > One thing I would like to suggest. As a business programmer, I may have
> > to deal with matrices, but I would prefer not have to deal with
> > implementing individual matrix methods. Which is one reason someone like
> > me would want to use a library like commons-math in the first place.
> > However, as I found recently, I may have to make different
> > implementations of RealMatrix, which I would prefer to do by subclassing
> > and overriding the getEntry() and setEntry() methods. Many algorithms
> > are based on matrices (and they probably assume a JVM with a huge amount
> > of memory), so it makes the code cleaner to use matrices as an
> > abstraction, but I could use a Map, Lucene index, database table, etc as
> > storage for my data. Would it make sense to have a generic impl of
> > RealMatrix, perhaps called AbstractRealMatrixImpl where the getEntry and
> > setEntry are abstract, which is meant for this sort of application?
> > Obviously, this implies that there is a guarantee that getEntry and
> > setEntry are used uniformly throughout the code, except for special
> > cases such as RealMatrixImpl.
> 
> This does make sense to me as we seem to have several implementations in
> the near future. This was probably overkill with our single
> implementation up to now. It is also quite easy to do with modern
> development environments that allow refactoring, interface extraction
> and so on ...
> Could you please open a dedicated JIRA issue for this specific point so
> we don't forget it ?
> 
> thanks,
> Luc
> 
> > 
> > -sujit
> > 
> > On Tue, 2008-12-02 at 14:19 +0100, luc.maisonobe@free.fr wrote:
> >> ----- "Ted Dunning" <ted.dunning@gmail.com> a écrit :
> >>
> >>> Is performance in LUD normally achieved by fast element access?  Or is
> >>> it
> >>> done by using higher order operators on a view of a row (or column)?
> >> This is not used for LUD but only for simpler operations (multiplication, addition,
subtraction). For these operations, the algorithm is almost reduced to one operation, so a
single getEntry() instead of a direct access result is really a performance bottleneck. This
does not mean it is the only point to improve, but it was the easiest one.
> >>
> >> The more I think about it, the more I feel your proposal to change the underlying
layout is the way to go. I am ready to do it if you want once I have completed a first working
implementation for SVD (even if this first implementation is not fast enough). I would however
be interested if you could provide a few pointers to me, as you may have noticed, linear algebra
is not my main field of activity.
> >>
> >> What would you suggest ? How is storage handled in colt ?
> >>
> >> Luc
> >>  
> >>
> >>> I know the the Colt package made the argument that the latter it far
> >>> preferable.  I also know that Jama didn't do this, but was very hard
> >>> to
> >>> extend as a result.
> >>>
> >>> The colt approach was to require the matrices provide row, column and
> >>> submatrix view operations and that vectors and matrices support
> >>> functionally
> >>> oriented destructive updates.  This can be hard for users to
> >>> understand so
> >>> you generally have to provide more algebraic interfaces as well, but
> >>> it can
> >>> work wonders for performance.
> >>>
> >>> On Mon, Dec 1, 2008 at 2:52 PM, Luc Maisonobe <Luc.Maisonobe@free.fr>
> >>> wrote:
> >>>
> >>>> For now, one thing  that bother me is the removal of the dedicated
> >>> code
> >>>> that explicitely avoided getEntry(). This was added a few months ago
> >>> for
> >>>> performance reason, since the previous generic code was painfully
> >>> slow.
> >>>> The trick with the ClassCastException allows really fast check since
> >>> the
> >>>> virtual machine can completely optimize it out in some cases, it is
> >>> an
> >>>> enhancement of what was discussed here:
> >>>> http://markmail.org/message/36fgg7vx6gzd3ziu. Our discussion at
> >>> that
> >>>> time was that the more common case (RealMatrixImpl) should be as
> >>>> efficient as possible (and Ted wants now it to be even faster by
> >>>> changing the underlying storage which is a good thing). This trick
> >>> is
> >>>> not beautiful perfectly generic code, it is a pragmatic way to be
> >>> fast
> >>>> in a common case and removing it will most probably induce poorer
> >>>> performances.
> >>>>
> >>>
> >>>
> >>> -- 
> >>> Ted Dunning, CTO
> >>> DeepDyve
> >>> 4600 Bohannon Drive, Suite 220
> >>> Menlo Park, CA 94025
> >>> www.deepdyve.com
> >>> 650-324-0110, ext. 738
> >>> 858-414-0013 (m)
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message