mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile<Writable,VectorWritable> instead of SequenceFile<IntWritable,VectorWritable>
Date Thu, 04 Mar 2010 17:10:44 GMT
To be conformable, the dictionaries must be the identical object.

At that point, you do what we do now.  The labels are irrelevant to the dot
product and are only used during input and output.

In pseudo code:

*       Dictionary d = new ...
       LabeledSparseFancyMatrix a = new LabeledSparseFancyMatrix(d)
       read(a)   // this modifies d
       LabeledSparseFancyMatrix b = new LabeledSparseFancyMatrix(d)
       read(b)   // this also modifies d

       a.times(b)    // this checks that a.d == b.d, then does
a.rawMatrix.times(b.rawMatrix)
*
make sense?

On Thu, Mar 4, 2010 at 8:59 AM, Jake Mannix <jake.mannix@gmail.com> wrote:

> On Thu, Mar 4, 2010 at 8:54 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> > I haven't examined the out-of-core scenarios at all, but in-memory, it is
> > possible to have labels with no performance cost if you assume add the
> > constraint that labeled matrices are only conformable if they share the
> > identical label dictionary.  That implies that you can use the internal
> row
> > and column indexes for all internal operations.
>
>
> Care to elaborate?  If you're multiplying two a matrix by a vector, both
> labeled by  Map<Integer,String> and reverse Map<String,Integer> for both
> the rows and columns (and they match in the right way), what is the fast
> way to do the individual dot products, which performs comparably to
> walking the sparse int[] / double[] parallel arrays?
>
>  -jake
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message