My apologies, but I have totally lost track of who said what because too
many comments have enormous lines immediately adjacent to responses.
On Tue, Jan 1, 2013 at 11:39 AM, Somebody <some@body.org> wrote:
> I thought that maybe it was due to the underlying
> (dynamic) data structure for sparse vectors/matrices
> (OpenIntToDoubleHashMap), while typical storage schemes (compressed row,
> skyline) might be more efficient (since only arrays are used), but do not
> lend themselves to changes of the value of initially null coefficients.
> That's why I was suggesting immutable matrices as a start, but what I
> really meant was matrices where the null coefficients are constant.
>
> Please note that your objection does not hold with iterative linear solvers
> (conjugate gradient, ...), so immutable matrices might still be
> interesting.
>
One of the benefits of making it easy to extend matrices can be found in
Mahout (which I should emphasize is probably only a source of ideas, not a
perfect paragon of perfection by any means).
As a result of easy extensibility, we have in mahout two kinds of sparse
vectors and many kinds of sparse matrices.
One kind of sparse vector uses an open hash table (RandomAccessHashTable).
It works well for mutable situations, but is a bit more memory hungry than
might be liked.
The other kind is implemented using an array of indexes and an array of
doubles. It can be read randomly with log n worst case performance or log
log n performance if the indices are well distributed. It is nearly
immutable in that after a sequence of mutations, it requires substantial
time and possibly memory to merge itself back together for more read
operations. What it does phenomenally well, however, is support iteration
with little memory overhead.
As far as matrices are concerned, we have a dense matrix backed by a single
indexed double[]. We have a row sparse matrix that has rows that are
either kind of sparse vectors. We have block sparse matrices that has row
patches that are matrices which need not exist, but are created lazily if
written to. We have memory mapped dense matrices. We have a memory mapped
dense matrix that maps several regions of large files together to form a
single matrix (since mmap only supports 2GB files).
Some of these storage forms are in Mahout math. Some are in applications.
The virtue here is that we didn't have to discuss these matters much. We
could just say "Sounds like a great idea, can you build a prototype to
demonstrate your point?" to any bright spark who came along. Many
prototypes were built and several were incorporated.
So the impact of a simple core API (with linear algebra, mutable operations
and OK, but not great visitor patterns) and associated abstract classes and
abstract tests was as much social as technical. The social virtues have,
in turn, led to technical virtues.
