mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Schelter (Commented) (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-872) Revisit the parallel ALS matrix factorization
Date Fri, 04 Nov 2011 09:39:00 GMT


Sebastian Schelter commented on MAHOUT-872:

High level documention added at
> Revisit the parallel ALS matrix factorization
> ---------------------------------------------
>                 Key: MAHOUT-872
>                 URL:
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.6
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.6
> Our current code for computing a decomposition of a rating matrix with Alternating Least
Squares (ALS) uses a lot of highly unefficient reduce side joins. 
> The rating matrix A is decomposed into a matrix U of users x features and a matrix M
of items x features. Each of these matrices is iteratively recomputed until a maximum number
of iterations is reached
> If we assume that U and M fit into the memory of a single mapper instance, each iteration
can be implemented as single map-only job, which greatly improves the runtime of this job.
> Note that in spite of these improvements this job is still rather slow as Hadoop is a
poor fit for iterative algorithms. Each iteration has to be scheduled again and data is always
read from and written to disk.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message