mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Problem of dimensions
Date Wed, 16 Jul 2014 14:53:21 GMT
There IS no issue with nrow being a lazy val. I never touch it read below.

creating a new matrix val is fine if it doesn’t cause a new rdd to be created I’ll look
into that.

rbind as I read it requires me to construct the rows to be added. I don’t know what their
keys are and don’t want to calculate them. If I’m right about how the math works the actual
rows are not needed. This looks like a much heavier weight operation than just changing the
row cardinality and works for other cases where you are adding real vectors. 

I’ll look deeper now that cross-cooccurrence seems to be fixed.

On Jul 15, 2014, at 7:40 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

The rbind approach also gives a new object and avoids all questions of lazy
evaluation.



On Tue, Jul 15, 2014 at 1:04 PM, Anand Avati <avati@gluster.org> wrote:

> 
> 
> 
> On Tue, Jul 15, 2014 at 12:45 PM, Pat Ferrel <pat.ferrel@gmail.com> wrote:
> 
>> I appreciate the thoughts.
>> 
>> I don’t change nrow it is still a lazy val. I change _nrow, which is a
>> var and is used to calculate nrow when it is needed. The only thing run on
>> them is the CheckpointedDrmSpark constructor. The class exists to guarantee
>> the drm is pinned down and _nrow is changed after construction but before
>> any math is done on it. Changing _nrow may be safe on a
>> CheckpointedDrmSpark but the question is why I’ll put it up on a PR.
>> 
>> btw I was thinking of calling the method
>> CheckpointedDrmSpark#addEmptyRows, which since it’s sparse will just change
>> _nrow and will flag the purpose of the method not to mention it avoids the
>> question about reducing the number of rows.
> 
> 
> 
> I would prefer a new rbind() operator instead of addEmptyRows() method.
> Just feels more consistent.
> 
> Thanks
> 


Mime
View raw message