mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: [jira] Updated: (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
Date Thu, 06 Jan 2011 21:17:44 GMT
Hi Shannon, sorry to have been absent too much in this thread!

On Thu, Dec 30, 2010 at 2:16 PM, Shannon Quinn <squinn@gatech.edu> wrote:

> I'm just about finished with this patch (though I'm road tripping at the
> moment), but I wanted to seek some clarification on the mechanics behind
> DRM's matrix multiplication.
>
> I see upon closer inspection that what is actually used is the transpose of
> the multiplicand (matrix A^T in A*B), thereby using only matrix rows (how
> DRMs are organized across HDFS). However, I didn't see any explicit
> transpose operation within the times() method. How is this carried out?
>

The transpose operation is a side effect of the fact that a DRM just
consists of a list of vectors, and you could view it as a row-based matrix,
or a column based matrix.  The matrix multiplication like so:

  Matrix A has N rows (each of which has cardinality M_A), and Matrix B has
N rows (each of which has cardinality M_B).  There are thus N pairs of
vectors {A_i, B_i}, and if you take MatrixSum_{i=1,N} (A_i^T x B_i), you get
a matrix with M_A rows, each of which has cardinality M_B, and this matrix
is exactly A^T * B.

*You take the transpose on the vectors, row at a time*, from the first of
the two matrices.

  -jake


> I want to understand this little bit so I adequately replicate it in the
> new patch. Thanks!
>
> Shannon
>
> Apologies for the brevity, this was sent from my iPhone
>
> On Dec 29, 2010, at 1:06, "Shannon Quinn (JIRA)" <jira@apache.org> wrote:
>
> >
> >     [
> https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> >
> > Shannon Quinn updated MAHOUT-537:
> > ---------------------------------
> >
> >    Attachment: MAHOUT-537.patch
> >
> > Updated patch. Fixes from previous patch are included, this time merged
> with unrelated changes to the related files. Also removed all the
> commented-out old code, and even caught and fixed a few bugs. Fully
> implemented timesSquared(). All that remains is the times(DRM) job. Will
> update on this very soon.
> >
> > (regarding the previous comments on this ticket: I'm using Hadoop 0.20.2)
> >
> >> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> >> -------------------------------------------------------------
> >>
> >>                Key: MAHOUT-537
> >>                URL: https://issues.apache.org/jira/browse/MAHOUT-537
> >>            Project: Mahout
> >>         Issue Type: Improvement
> >>   Affects Versions: 0.4
> >>           Reporter: Shannon Quinn
> >>           Assignee: Shannon Quinn
> >>        Attachments: MAHOUT-537.patch, MAHOUT-537.patch
> >>
> >>
> >> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2
> API, in particular eliminate dependence on the deprecated JobConf, using
> instead the separate Job and Configuration objects.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message