mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Avati <av...@gluster.org>
Subject Re: Problem of dimensions
Date Mon, 14 Jul 2014 18:04:57 GMT
On Mon, Jul 14, 2014 at 10:58 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Mon, Jul 14, 2014 at 9:47 AM, Pat Ferrel <pat@occamsmachete.com> wrote:
>
> > BTW that requires that drm.nrow be mutable. That is defined as immutable
> > in the DSL and so will require a change to several traits. I’ve done this
> > but am still trying to decide the cleanest.
>
>
> Hmmm.... immutability has lots of virtues.  And changing nrows is just the
> tip of the iceberg.  You also have to shuffle the rows to match the row
> partitioning between the two matrices.
>
> Or it requires more than one pass through the data.  Since you have to read
> both matrices before you can deal with either, and since one matrix is
> likely to be shuffled relative to the other, might it just be better to
> either do two read passes or pay the cost to shuffle the matrices after
> getting a consensus view. Note that the second read pass will have to do a
> shuffle any way so the only savings to doing two passes is to decrease
> memory usage.
>
> *Anand,*
>
> I think I remember you were addressing a shuffle problem in some of your
> earlier work.  What did you conclude?
>

I think the larger question is, what does it mean to make drm.nrow mutable.
If changed to a smaller value, which rows do you "sacrifice". Why not just
do a RowRange operation to get a new DRM with fewer rows (instead of
mutating the given drm)? After that, if you care specifically about
partitioning the Par operator can shuffle data for you.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message