mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Schelter (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAHOUT-657) Sample code to apply SVD to the KDD data
Date Thu, 07 Apr 2011 21:18:06 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebastian Schelter updated MAHOUT-657:
--------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Sample code to apply SVD to the KDD data
> ----------------------------------------
>
>                 Key: MAHOUT-657
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-657
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 0.5
>
>         Attachments: MAHOUT-657.patch
>
>
> I was incited by some comments on twitter to make our SVD-based recommendation code work
on the KDD data. Here's the results so far:
> The patch contains a tweaked version of ExpectationMaximizationSVDFactorizer (org.apache.mahout.cf.taste.example.kddcup.track1.svd.ParallelArraysSGDFactorizer)
in the examples module, that is able to load and process the KDD dataset with a constant memory
usage of approximately 7 gb (by using primitive arrays for everything). 
> It's still very slow unfortunately, a factorization using 40 features and 25 iterations
took 10 hours on my desktop PC. As far as I understand the math behind it, the algorithm is
not parallelizable but maybe someone might be able to improve my implementation or make it
compute several factorizations at once.
> I took a wild guess on the parameters and got an RMSE of 23.35 to the validation set
and and RMSE of 26.1287 to the secret test ratings (that's rank 63 by the time of this writing).
> Would love to see people play with this code and improve it!
> In order to use this, have a look at the parameters in *org.apache.mahout.cf.taste.example.kddcup.track1.svd.Track1SVDRunner*,
change them as you see fit and run that class with the path to the kdd data directory and
the path to the file you wanna have the results stored in as arguments. In my tests I used
*-Xms6700M -Xmx6700M* to give the JVM enough memory for 40 features.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message