mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Baker <>
Subject Re: Talking about optimization
Date Mon, 09 Jun 2014 23:45:38 GMT
I just joined the mailing lists and have apparently dropped in the middle
of the conversation, so excuse me if I'm repeating stuff that's already
been said and/or isn't useful.

Sparse mat-vec is, in general, a bandwidth-limited operation. It has very
low arithmetic density and is, in general, a waste of a good CPU. I'll
assume dense operand vector, since that is the use case for most algorithms
(and since it's what I'm most familiar with ;).

The best options for improving the performance are to improve the memory
access patterns through the operand vector (assuming SRM). One of the
simplest way to do that is to turn your sparse matrix into a block matrix
and then rely on efficient local/dense matvec to make up for the zero
elements that you undoubtedly introduced; this generally requires
reordering the matrix to expose these blocks. If the analysis required for
this isn't itself parallel, then you've traded your embarrassingly parallel
(albeit, computationally dense) for a largely serial operation; you'd
better be using a lot of times if you want to amortize that cost.

There is a whole class of communication avoiding algorithms that seek to
avoid this off-die and off-node data transfer, for example, by applying the
matrix in powers (exploit a little extra transfer to perform A^3, A^2 and A
at the same time and perform three steps of, e.g., Lanczos, while paying
for the latency of only one). However, the performance of these is still
very dependent on the sparsity structure of the matrix, and the algorithms
are significantly more difficult and numerically sensitive.

Block algorithms (those that apply the matrix to multiple operand vectors
at the same time) are also useful, because they allow for re-use of the
sparsity structure. Course, that's a chicken-egg issue; there's limited
incentive to write block algorithms without a block sparse mat-vec, and
limited reason to write a blocked mat-vec without the indication that folks
will write algorithms.

On Mon, Jun 9, 2014 at 7:20 PM, Dmitriy Lyubimov <> wrote:

> no argument against it, except who's going to do e2e local math optimizer
> and how long it will take. maybe, both is possible -- faster local gains we
> know current algorithms need, and longer term overhaul that included e2e
> in-core plans.
> On Mon, Jun 9, 2014 at 4:16 PM, Ted Dunning <> wrote:
> > On Mon, Jun 9, 2014 at 4:12 PM, Dmitriy Lyubimov <>
> > wrote:
> >
> > > One open question i had was,well, since the outlined approach avoids
> > > logical plans still for in-core matrix, it would be difficult to
> optimize
> > > the expressions in this context (i.e. figure out best-working _output_
> > > structure).
> > >
> >
> > This is the single biggest flaw that I see.  To do higher level
> > optimizations, the lower levels have to expose costs for possible plans.
> >
> > Taking a major whack at the local cases for element-wise and matrix
> > multiply is still a good thing to do.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message