 "Ted Dunning" <ted.dunning@gmail.com> a écrit :
> I just read all of these references, but they all seem confined to
> relatively naive implementations of multiply. Notably, they didn't do
> the
> standard two levels of block decomposition that is typically necessary
> to
> get good register and cache hit statistics.
Yes and no. The core algorithms use a dedicated representation for tridiagonals or bidiagonal
matrices. Dqd/dqds even improved caching more by merging several single arrays into one. For
example the pingpong implementation of the algorithms theoretically needs four arrays: one
for the qi elements, one for ei elements, one for qqi elements and one for eei elements. All
arrays are merged together in an array containing (q1, e1, qq1, ee1, q2, e2, qq2, ee2, ...).
This organisation is very efficient because all loops use only neighboring elements which
is cache friendly.
However, the preprocessing part which transform the initial matrix into tridiagonal or bidiagonal
form can be improved by better layout. The good news here is that this part is a simple one
and could be done easily for several different representations. The postprocessing could also
be improved, but this is useful only when the full orthogonal matrices are required. Sometimes
these matrices are not desired at all because only the eigenvalues or singular values are
needed (and even sometimes only the smallest or largest ones). It is also possible to provide
a way to compute U.v where U is one of the orthogonal matrices and v a vector without explicitely
creating matrix U by using only the set of Householder vectors that defines U.
Don't get me wrong, I do agree with you that there is room for improvement. The current status
is however usable even for large matrices. For dimensions above 300 it is three times faster
than Jama when both eigenvalues and eigenvectors are used and about ten times faster when
only eigenvalues are needed. I know a large international scientific project that do use Jama
and will probably switch to commonsmath with the current status of active maintainance and
performances.
Do you think we should already add an implementation using block decomposition with a fixed
block size around 1020 ?
Luc
>
> That means that these demonstrations are pretty unconvincing to me ...
> I
> wouldn't be surprised if the fact that the advantage evaporates at
> larger
> matrix sizes is just a reflection of a need for better blocking at
> larger
> matrix sizes. This is supported (slightly) by the fact that 1020 is
> pretty
> close to the blocking factor for many implementations.
>
>
> On Fri, Nov 28, 2008 at 1:15 AM, <luc.maisonobe@free.fr> wrote:
>
> >
> >  "Ted Dunning" <ted.dunning@gmail.com> a écrit :
> >
> > > Luc,
> > >
> > > Last I looked, I think I saw that commons math used a double
> indirect
> > > storage format very similar to Jama.
> > >
> > > Is there any thought to going to a higher performance layout such
> as
> > > used by
> > > Colt?
> >
> > Yes.
> > The linear algebra package is defined using interfaces RealMatrix,
> > RealVector and separate implementations RealMatrixImpl,
> RealVectorImpl ...
> > The purpose is to abstract the storage method. For now, only one
> > representation has been used: a simple double[][]. It is possible to
> provide
> > other implementations too, for example a single array, handling the
> index
> > computation by some intermediate layer. I have recently seen a
> discussion on
> > the web where it was argued that single array was more efficient
> only for
> > low dimensions (see
> > http://cio.nist.gov/esd/emaildir/lists/jama/msg01425.html). Also the
> > optimizations of array access in the JVM has been vastly improved
> last few
> > years, so I think it would be worth checking what can be observed
> now.
> > Providing several different implementations using different
> strategies would
> > be a very interesting feature, allowing users to choose the
> implementation
> > that best suit their needs.
> >
> > Another improvement I am thinking about is to provide specialized
> > implementations for some matrices shapes. The first one would simply
> be band
> > matrices. A single class could handle diagonal matrices (when
> configured
> > with 0 subdiagonals and 0 superdiagonals), bidiagonals (lower and
> upper),
> > tridiagonals and even triangular matrices with a rather extreme
> > configuration with n sub or superdiagonal. I thought this would
> better be
> > implemented with a single double array row by row for data locality
> and
> > hence cache efficiency. Sparse matrices would be the next step.
> >
> > We need time to do this.
> >
> > Luc
> >
> > >
> > > On Thu, Nov 27, 2008 at 9:10 AM, <luc.maisonobe@free.fr> wrote:
> > >
> > > > Hello,
> > > >
> > > > This commit is the result of weeks of work. I hope it completes
> an
> > > > important feature
> > > > to [math], computation of eigenvalues and eigenvectors for
> symmetric
> > > real
> > > > matrices.
> > > >
> > > > The implementation is based on algorithms developed in the last
> 10
> > > years or
> > > > so. It is based partly on two reference papers and partly on
> LAPACK.
> > > Lapack
> > > > is distributed under a modifiedBSD license, so this is
> acceptable
> > > for
> > > > [math]. I have updated the NOTICE file and taken care of the
> proper
> > > > attributions in Javadoc.
> > > >
> > > > The current status is that we can solve eigenproblems much
> faster
> > > than Jama
> > > > (see the speed gains in the commit message below). Furthermore,
> the
> > > > eigenvectors are not always computed, they are computed only if
> > > needed. So
> > > > applications that only need eigenvalues will benefit from a
> larger
> > > speed
> > > > gain. This could even be improved again by allowing to compute
> only
> > > some
> > > > eigenvalues, not all of them. This feature is available in the
> > > higher level
> > > > LAPACK routine, but I didn't include it yet. I'll do it only
> when
> > > required,
> > > > as this as already been a very large amount of work.
> > > >
> > > > If someone could test this new decomposition algorithm further,
> I
> > > would be
> > > > more than happy.
> > > >
> > > > My next goal is now to implement Singular Value Decomposition. I
> > > will most
> > > > probably use a method based on eigen decomposition as this seems
> to
> > > be now
> > > > the prefered way since dqd/dqds and MRRR algorithms are
> available.
> > > >
> > > > Luc
> > > >
> > > >  luc@apache.org a écrit :
> > > >
> > > > > Author: luc
> > > > > Date: Thu Nov 27 07:50:42 2008
> > > > > New Revision: 721203
> > > > >
> > > > > URL: http://svn.apache.org/viewvc?rev=721203&view=rev
> > > > > Log:
> > > > > completed implementation of EigenDecompositionImpl.
> > > > > The implementation is now based on the very fast and accurate
> > > dqd/dqds
> > > > > algorithm.
> > > > > It is faster than Jama for all dimensions and speed gain
> increases
> > > > > with dimensions.
> > > > > The gain is about 30% below dimension 100, about 50% around
> > > dimension
> > > > > 250 and about
> > > > > 65% for dimensions around 700.
> > > > > It is also possible to compute only eigenvalues (and hence
> saving
> > > > > computation of
> > > > > eigenvectors, thus even increasing the speed gain).
> > > > > JIRA: MATH220
> > > >
> > > >
> > >
> 
> > > > To unsubscribe, email: devunsubscribe@commons.apache.org
> > > > For additional commands, email: devhelp@commons.apache.org
> > > >
> > > >
> > >
> > >
> > > 
> > > Ted Dunning, CTO
> > > DeepDyve
> > > 4600 Bohannon Drive, Suite 220
> > > Menlo Park, CA 94025
> > > www.deepdyve.com
> > > 6503240110, ext. 738
> > > 8584140013 (m)
> >
> >
> 
> > To unsubscribe, email: devunsubscribe@commons.apache.org
> > For additional commands, email: devhelp@commons.apache.org
> >
> >
>
>
> 
> Ted Dunning, CTO
> DeepDyve
> 4600 Bohannon Drive, Suite 220
> Menlo Park, CA 94025
> www.deepdyve.com
> 6503240110, ext. 738
> 8584140013 (m)

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
