mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark
Date Mon, 02 Jun 2014 17:50:02 GMT


Dmitriy Lyubimov commented on MAHOUT-1365:

[~ssc] Since you've done you before, can you please eyeball this and make a suggestion ? 
my current implementation proceeds with computations based on formula (7) in the pdf which
is in its turn is derived directly from both papers.  (we ignore baseline confidence which
i denote as c_0 in which case the expression under inversion comes apart as V'V which is common,
tiny for all item vectors so it is just computed once and broadcasted; and then individual
item correction U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).

That kind of means that every U row has to send a message to every V for which where c!= c_0.
I previously have done it with pregel. It turns out, in spark Bagel is a moot point since
it is simply using groupBy underneath rather than a custom multicast communication. Still
though, if i did it today, I would have to do a coGroup or something to achieve similar effect.
Question is if there's a neat way to translate it into our current set of linear algebra primitives,
or that's it, it would be our first case when we would have to create our first method that
in part would be tightly coupled to Spark? Any thoughts?

> Weighted ALS-WR iterator for Spark
> ----------------------------------
>                 Key: MAHOUT-1365
>                 URL:
>             Project: Mahout
>          Issue Type: Task
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>         Attachments: distributed-als-with-confidence.pdf
> Given preference P and confidence C distributed sparse matrices, compute ALS-WR solution
for implicit feedback (Spark Bagel version).
> Following Hu-Koren-Volynsky method (stripping off any concrete methodology to build C
matrix), with parameterized test for convergence.
> The computational scheme is following ALS-WR method (which should be slightly more efficient
for sparser inputs). 
> The best performance will be achieved if non-sparse anomalies prefilitered (eliminated)
(such as an anomalously active user which doesn't represent typical user anyway).
> the work is going here
I am porting away our (A1) implementation so there are a few issues associated with that.

This message was sent by Atlassian JIRA

View raw message