mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Problem of dimensions
Date Mon, 21 Jul 2014 21:21:33 GMT
also, note that parallelizeEmpty() does not create anything but a standard
int-keyed matrix with all rows indexed accordingly. That means it cannot be
r-bound with something that is not int-keyed (but perhaps it could be bound
after intermediate map-block for keys).

On Mon, Jul 21, 2014 at 1:42 PM, Pat Ferrel <> wrote:

> Thank you! This is what I understood and I’m doing a little dance for joy
> (in my mind).  This makes sparseness all encompassing, at least for
> sequential Int keys.
> However Anand has found several math ops that don’t work.
> I’ll write up a few tests for transpose and multiply at least since these
> are used in cooccurrence. And I’ll be happy to implement something that
> changes nrow in an immutable R-like way. Anand and Ted suggested rbind
> of drmParallelizeEmpty with added row cardinality. This would really only
> change nrow of the resulting CheckPointedDrm, it would not alter the rdd.
> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <> wrote:
> "missing" rows are only valid in context of int-keyed matrices and
> physical transposition operations. These are the only that may depend on
> it, since obviously one can't define "missing-ness" for something that is
> String-keyed.
> So the only thing that may fail because of "missing-ness" effect is
> probably physical transposition operator (we don't have test for such case,
> so maybe there's a bug in that case). Everything else should work.
> And no, i suppose it is ok to have "missing" rows even in case of
> int-keyed matrices.
> there's one thing that you probably should be aware in this context
> though: many algorithms don't survive empty (row-less) partitions, in
> whatever way they may come to be. Other than that, I don't feel every row
> must be present -- even if there's implied order of the rows.
> On Mon, Jul 21, 2014 at 12:22 PM, Pat Ferrel <> wrote:
>> I appreciate that you can’t read all the back and forth Dmitriy hence the
>> private email. Please disregard all other code or talk in the thread for
>> the moment.
>> Does a DRM need to have a row for every sequential row key from 0 to
>> nrow-1 ? Can there be missing row keys in the sequence and will they be
>> treated as {}, an all zero row? In terms of the rdd in the CheckpointedDrm
>> these “missing” rows will not have a corresponding n => {}, they will just
>> not exist in the rdd. This will happen when a row is “missing” from the DRM
>> but the true cardinality is known and passed in to the CheckpointedDrm
>> constructor.
>> Will R-like operations on these matrices work correctly. Will A.t %*% A
>> and A + 1 work correctly?
>> The answer is no, but _should_ they work correctly?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message