commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujit Pal <sujit....@comcast.net>
Subject Re: [math] patch submitted for commons-math sparse matrix support
Date Wed, 03 Dec 2008 21:58:49 GMT
Ok, thanks very much.

-sujit

On Wed, 2008-12-03 at 22:53 +0100, Luc Maisonobe wrote:
> Sujit Pal a écrit :
> > Thanks, added it to JIRA:
> > https://issues.apache.org/jira/browse/MATH-231
> > 
> > Also, please let me know if you want me to go ahead and submit a patch
> > for this.
> 
> No, this is not needed, thanks.
> 
> > 
> > I also noticed that MATH-230 is open/unassigned. The patch is already
> > in, so all that remains is to review it before deciding to check in or
> > not. Not sure if I need to do anything here, if yes, please let me know.
> 
> I'll review it. I am already late but in the last few days, many
> interesting issues have been discussed about the linear algebra package.
> The current tasks for me are (in random order, counting only linear
> algebra):
>   - completing singular value decomposition implementation
>   - reworking the decomposition API for all algorithms
>   - thinking about changing double[][] to something faster
>   - reviewing your patch for MATH-230
>   - extracting the abstract class for MATH-231
>   - adding a few new methods for row/column handling someone asked for
>     off-list
>   - thinking about views/selections
>   - thinking about field genericity
>   - benchmarking
> 
> Luc
> 
> > 
> > -sujit
> > 
> > On Tue, 2008-12-02 at 23:02 +0100, Luc Maisonobe wrote:
> >> Sujit Pal a écrit :
> >>> I've updated MATH-230 with a fresh patch (this time against trunk, since
> >>> MATH_2_0 has been moved to trunk) putting back the fallback code Luc
> >>> pointed out was used for performance.
> >> Thanks.
> >>
> >>> Details are in the bug:
> >>> http://issues.apache.org/jira/browse/MATH-230
> >>>
> >>> There are also some test times for running with the current version vs
> >>> the getEntry() stuff I put in, and for the current implementation. The
> >>> performance does degrade for large matrices with getEntry(), but not by
> >>> much for the sizes I was looking at, but since the complexity is
> >>> O(n**2), there is going to be more degradation at larger sizes. I wasn't
> >>> able to get the test to complete in either case in a reasonable length
> >>> of time (~10s).
> >>>
> >>> size               data[][]  getEntry()  restored data[][] 
> >>> [3x2]*[2x4]          0ms         0ms           0ms
> >>> [6x4]*[4x8])         0ms         0ms           0ms
> >>> [12x8]*[8x16])       1ms         0ms           0ms
> >>> [24x16]*[16x32]      2ms         2ms           2ms
> >>> [48x32]*[32x64])     14ms        16ms          23ms
> >>> [96x64]*[64x128]     2ms         3ms           3ms
> >>> [192x128]*[128x256]  24ms        50ms          28ms
> >>> [384x256]*[256x512]  770ms       813ms         768ms
> >> These differences are very small. The benchmark I did some months ago
> >> were rather ratios about 2 or 3 (but there where also other problems
> >> like spurious bounds checkings).
> >>
> >>> One thing I would like to suggest. As a business programmer, I may have
> >>> to deal with matrices, but I would prefer not have to deal with
> >>> implementing individual matrix methods. Which is one reason someone like
> >>> me would want to use a library like commons-math in the first place.
> >>> However, as I found recently, I may have to make different
> >>> implementations of RealMatrix, which I would prefer to do by subclassing
> >>> and overriding the getEntry() and setEntry() methods. Many algorithms
> >>> are based on matrices (and they probably assume a JVM with a huge amount
> >>> of memory), so it makes the code cleaner to use matrices as an
> >>> abstraction, but I could use a Map, Lucene index, database table, etc as
> >>> storage for my data. Would it make sense to have a generic impl of
> >>> RealMatrix, perhaps called AbstractRealMatrixImpl where the getEntry and
> >>> setEntry are abstract, which is meant for this sort of application?
> >>> Obviously, this implies that there is a guarantee that getEntry and
> >>> setEntry are used uniformly throughout the code, except for special
> >>> cases such as RealMatrixImpl.
> >> This does make sense to me as we seem to have several implementations in
> >> the near future. This was probably overkill with our single
> >> implementation up to now. It is also quite easy to do with modern
> >> development environments that allow refactoring, interface extraction
> >> and so on ...
> >> Could you please open a dedicated JIRA issue for this specific point so
> >> we don't forget it ?
> >>
> >> thanks,
> >> Luc
> >>
> >>> -sujit
> >>>
> >>> On Tue, 2008-12-02 at 14:19 +0100, luc.maisonobe@free.fr wrote:
> >>>> ----- "Ted Dunning" <ted.dunning@gmail.com> a écrit :
> >>>>
> >>>>> Is performance in LUD normally achieved by fast element access?
 Or is
> >>>>> it
> >>>>> done by using higher order operators on a view of a row (or column)?
> >>>> This is not used for LUD but only for simpler operations (multiplication,
addition, subtraction). For these operations, the algorithm is almost reduced to one operation,
so a single getEntry() instead of a direct access result is really a performance bottleneck.
This does not mean it is the only point to improve, but it was the easiest one.
> >>>>
> >>>> The more I think about it, the more I feel your proposal to change the
underlying layout is the way to go. I am ready to do it if you want once I have completed
a first working implementation for SVD (even if this first implementation is not fast enough).
I would however be interested if you could provide a few pointers to me, as you may have noticed,
linear algebra is not my main field of activity.
> >>>>
> >>>> What would you suggest ? How is storage handled in colt ?
> >>>>
> >>>> Luc
> >>>>  
> >>>>
> >>>>> I know the the Colt package made the argument that the latter it
far
> >>>>> preferable.  I also know that Jama didn't do this, but was very
hard
> >>>>> to
> >>>>> extend as a result.
> >>>>>
> >>>>> The colt approach was to require the matrices provide row, column
and
> >>>>> submatrix view operations and that vectors and matrices support
> >>>>> functionally
> >>>>> oriented destructive updates.  This can be hard for users to
> >>>>> understand so
> >>>>> you generally have to provide more algebraic interfaces as well,
but
> >>>>> it can
> >>>>> work wonders for performance.
> >>>>>
> >>>>> On Mon, Dec 1, 2008 at 2:52 PM, Luc Maisonobe <Luc.Maisonobe@free.fr>
> >>>>> wrote:
> >>>>>
> >>>>>> For now, one thing  that bother me is the removal of the dedicated
> >>>>> code
> >>>>>> that explicitely avoided getEntry(). This was added a few months
ago
> >>>>> for
> >>>>>> performance reason, since the previous generic code was painfully
> >>>>> slow.
> >>>>>> The trick with the ClassCastException allows really fast check
since
> >>>>> the
> >>>>>> virtual machine can completely optimize it out in some cases,
it is
> >>>>> an
> >>>>>> enhancement of what was discussed here:
> >>>>>> http://markmail.org/message/36fgg7vx6gzd3ziu. Our discussion
at
> >>>>> that
> >>>>>> time was that the more common case (RealMatrixImpl) should be
as
> >>>>>> efficient as possible (and Ted wants now it to be even faster
by
> >>>>>> changing the underlying storage which is a good thing). This
trick
> >>>>> is
> >>>>>> not beautiful perfectly generic code, it is a pragmatic way
to be
> >>>>> fast
> >>>>>> in a common case and removing it will most probably induce poorer
> >>>>>> performances.
> >>>>>>
> >>>>>
> >>>>> -- 
> >>>>> Ted Dunning, CTO
> >>>>> DeepDyve
> >>>>> 4600 Bohannon Drive, Suite 220
> >>>>> Menlo Park, CA 94025
> >>>>> www.deepdyve.com
> >>>>> 650-324-0110, ext. 738
> >>>>> 858-414-0013 (m)
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message