mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Need some pointers towards algorithm capabilities.
Date Thu, 16 Dec 2010 10:56:14 GMT
This should address much of that:
https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation
As does the book yes.

The answer also depends on whether you want a Hadoop-based job, for which
there is not much written yet, or the more mature non-distributed version.

For Hadoop there is an item-based recommender with pluggable similarity
metrics.
For non-distributed there's much more.

Explicit vs implicit ratings and factoring in time are "out of scope" -- you
can collect data however you want and adjust it however you want. What
matters is what's fed into the framework. So the answer is, yes, that's
supported just fine, but not within the framework itself.

Long-tail issues are fine if you choose the right algorithms, and they are
going to vary a lot in this regard. For example a user-based or item-based
recommender with log-likelihood similarity, or an SVD-based recommender,
doesn't suffer as much from these issues.

The distributed version is necessarily batch -- it's Hadoop after all.
The non-distributed version is all real-time, incremental updates.

I am not sure what you mean by preprocessing daily data sets?

On Thu, Dec 16, 2010 at 10:35 AM, Niels Basjes <Niels@basjes.nl> wrote:

> Hi,
>
> I'm an experienced developer yet a complete newbie when it comes to
> the type of functionality Mahout offers.
> I do have some experience in designing and writing MapReduce jobs in
> Hadoop so I understand enough of the base platform that is used.
>
> I want to investigate and experiment with both the item-item and
> user-item recommenders in Mahout.
> The problem I have is that I'm having a hard time finding a good
> overview of the capabilities of the various algorithms.
> Most Wikipedia articles immediately dive into the underlying
> mathematical foundations instead of the practical implications I'm
> looking for.
> I've also not been able to find what I'm looking for in the Mahout
> Wiki/Confluence.
>
> Putting it simply I'm looking for a comprehensive overview of
> - the kind of things you can and cannot do with the various algorithms
> that are available in Mahout.
>    - can it handle both "Implicit" and "Explicit" ratings.
>    - can I 'age' the relevance of the (implicit) ratings? I.e.
> Recommendations should change with the changing taste.
>    - how does it handle in long tail situations (with millions of
> items most are only viewed/rated very infrequently)
> - what are the scaling properties of the algorithms.
>    - is it always batch or can I do real-time incremental updates
> with new ratings?
>    - can I preprocess a daily dataset and then combine the daily sets
> into "what I need"?
>
> Thanks for any info you can point me to.
>
> --
> Best regards,
>
> Niels Basjes
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message