mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferran Muñoz <>
Subject Re: Questions on Cooccurrence Recommenders with Spark
Date Fri, 27 Feb 2015 08:55:08 GMT
Thanks for your very good explanation about ratings. I agree with your
opinion. I am using MovieLens for test purposes. An application that offers
cooccurrence recommender based recommendations must use actions of the
users as a training set.

Regarding my question about the tags and the content based indicator I will
try to explain my question better. Using the spark-rowsimilarity function
we get a content type indicator. The first output (in the example) is
"3459860b<tab>3459860b 3459860b 6749860c 5959860a 3434860a 3477860a" . This
means that in the document of the "3459860b" item, we have to add a new
field (I will call it "tags-indicator") with the content "3459860b 3459860b
6749860c 5959860a 3434860a 3477860a". And we have to do the same with all
documents (returned by spark-rowsimilarity). Now we are ready to issue
queries using this content based indicator. Am I right?

When we want to issue queries we have to do the following. If our user have
purchased the item 3459860b (continuing with the example), then we have to
issue the following query:

field: purchase; q: 3459860b
field: tags-indicator; q: 3459860b

Besides that, if we want our results skewed towards items with similar tags
to the ones the user has already purchased (without using the content based
indicator), we can issue the following query:

field: purchase; q: 3459860b
field: tags; q: men long-sleeve chambray clothing casual

Is that ok? Or am I understanding anything wrong in using the content based

Ferran Muñoz

2015-02-27 2:16 GMT+01:00 Pat Ferrel <>:

> Long answer:
> Preferred tags is an example of an action that would not lead to
> recommendations in any other type of recommender. A user takes many actions
> in your app, not all of them have “purchase” intent behind them. What the
> cross-cooccurrence stuff does is find actions that correlate with the
> action you want to recommend. Don’t get too hung up in that before you
> understand the basics—it is a way to make better use of your data.
> The cooccurrence recommender does not use ratings. In fact any Mahout
> recommender that uses LLR ignores ratings. Ratings are very hard to use in
> practice since no two people rate on the same scale and the same person is
> often inconsistent about ratings. It is more important to find an indicator
> or preference and focus on _ranking_ better. Ask yourself if you want to
> predict a rating or show the user the things you think they will like in
> the right order (you can only recommend a fixed number of things after
> all). Not even Netflix, who led us into thinking ratings were important,
> use ratings predictions to make recommendations anymore and they have
> stated this publicly.
> Short Answer:
> Feed MovieLens in and you will get ranked ratings out of the system (it
> requires a search engine to query—don’t forget). If you want to toss the
> very low ratings the answers might be a little better but the fact that a
> user cared enough to watch the movie is the important thing.
> On Feb 26, 2015, at 12:08 AM, Ferran Muñoz <> wrote:
> Hello,
> I have read the "Intro to Cooccurrence Recommenders with Spark" of the
> Mahout documentation and I have a question regarding the unified
> recommender query. What does "user's-tags-associated-with-purchases"
> exactly mean? Does it mean that I have to put tags or itemids?
> I understand that the "tags" field of each item document contains the tags
> of this particular item. Then, what query do it have to write in order to
> get recommended items using the content-based indicator?
> On the other hand, how can I use ratings when computing
> the spark-itemsimilarity? For example, how can I use spark-itemsimilarity
> to get recommendations in MovieLens dataset (it has ratings, not boolean)?
> Thank you in advance.
> Ferran

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message