mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1069) Multi-target, side-info aware, SGD-based recommender algorithms, examples, and tools to run
Date Tue, 18 Sep 2012 12:23:07 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457769#comment-13457769
] 

Sean Owen commented on MAHOUT-1069:
-----------------------------------

I imagine this is all great work. As I commented off-list, it is a big enough and even different
enough beast that it feels like it should be a separate project. The Mahout code base is already
uneven and sprawling and I think this would exacerbate that -- and not generate much "synergy"
worth the effort of integration.
                
> Multi-target, side-info aware, SGD-based recommender algorithms, examples, and tools
to run
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1069
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1069
>             Project: Mahout
>          Issue Type: Improvement
>          Components: CLI, Collaborative Filtering
>    Affects Versions: 0.8
>            Reporter: Gokhan Capan
>            Assignee: Sean Owen
>              Labels: cf, improvement, sgd
>         Attachments: MAHOUT-1069.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Upon our conversations on dev-list, I would like to state that I have completed the merge
of the recommender algorithms that is mentioned in http://goo.gl/fh4d9 to mahout. 
> These are a set of learning algorithms for matrix factorization based recommendation,
which are capable of:
> * Recommending multiple targets:
> *# Numerical Recommendation with OLS Regression
> *# Binary Recommendation with Logistic Regression
> *# Multinomial Recommendation with Softmax Regression
> *# Ordinal Recommendation with Proportional Odds Model
> * Leveraging side info in mahout vector format where available
> *# User side information
> *# Item side information
> *# Dynamic side information (side info at feedback moment, such as proximity, day of
week etc.)
> * Online learning
> Some command-line tools are provided as mahout jobs, for pre-experiment utilities and
running experiments.
> Evaluation tools for numerical and categorical recommenders are added.
> A simple example for Movielens-1M data is provided, and it achieved pretty good results
(0.851 RMSE in a randomly generated test data after some validation to determine learning
and regularization rates on a separate validation data)
> There is no modification in the existing Mahout code, except the added lines in driver.class.props
for command-line tools. However, that became a huge patch with dozens of new source files.
> These algorithms are highly inspired from various influential Recommender System papers,
especially Yehuda Koren's. For example, the Ordinal model is from Koren's OrdRec paper, except
the cuts are not user-specific but global.
> Left for future:
> # The core algorithms are tested, but there probably exists some parts those tests do
not cover. I saw many of those in action without problem, but I am going to add new tests
regularly.
> # Not all algorithms have been tried on appropriate datasets, and they may need some
improvement. However, I use the algorithms also for my M.Sc. thesis, which means I will eventually
submit more experiments. As the experimenting infrastructure exists, I believe community may
provide more experiments, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message