Gábor Hermann created FLINK4961:

Summary: SGD for Matrix Factorization
Key: FLINK4961
URL: https://issues.apache.org/jira/browse/FLINK4961
Project: Flink
Issue Type: New Feature
Components: Machine Learning Library
Reporter: Gábor Hermann
Assignee: Gábor Hermann
We have started an implementation of distributed stochastic gradient descent for matrix factorization
based on Gemulla et al. [1].
The main problem with distributed SGD in general is the conflicting updates of the model variable.
In case of matrix factorization we can avoid conflicting updates by carefully deciding in
each iteration step which blocks of the rating matrix we should use to update the corresponding
blocks of the user and item matrices (see Figure 1. in paper).
Although a general SGD solver might seem relevant for this issue, we can do much better in
the special case of matrix factorization. E.g. in case of a linear regression model, the model
is broadcasted in every iteration. As the model is typically small in that case, we can only
avoid conflicts by having a "global" model. Based on this, the general SGD solver is a different
issue.
To give more details, the algorithm works as follows.
We randomly create user and item vectors, then randomly partition them into {{k}} user and
{{k}} item blocks. Based on these factor blocks we partition the rating matrix to {{k * k}}
blocks correspondingly.
In one iteration step we choose {{k}} nonconflicting rating blocks, i.e. we should not choose
two rating blocks simultaneously with the same user or item block. This is done by assigning
a rating block ID to every user and item block. We match the user, item, and rating blocks
by the current rating block ID, and update the user and item factors by the ratings locally.
We also update the rating block ID for the factor blocks, thus in the next iteration we use
other rating blocks to update the factors.
In {{k}} iteration we sweep through the whole rating matrix of {{k * k}} blocks (so instead
of {{numberOfIterationSteps}} iterations we should do {{k * numberOfIterationSteps}} iterations).
[1] [http://people.mpiinf.mpg.de/~rgemulla/publications]/gemulla11dsgd.pdf

This message was sent by Atlassian JIRA
(v6.3.4#6332)
