mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Memory and Speed Questions for Item-Based-Recommender
Date Mon, 13 Jul 2009 17:35:03 GMT
On Mon, Jul 13, 2009 at 9:21 AM, Sean Owen <> wrote:

> It would be interesting to see how it scales
> indeed.

It scales very well.  At Veoh we were serving about 400 queries per second
at one point.  This included searches and recommendations, but I think I
remember that one time more than half were recs.

> This doesn't include a notion of item ratings (well, maybe the
> "documents" can include the item tokens several times to indicate a
> stronger association) but that is not a necessary condition for good
> recommendations.

Actually it does.  That is in the off-line part.

But, as you likely know by now, I am an anti-fan of using ratings for
recommendations.  I think that the data is suspect and is generally about
two orders of magnitude smaller than other viewing data.  Given that it is
lower quality and vastly smaller, I see no utility in actually spending
thought on using that kind of data.  Often you can use that data for free,
but that is the only price I would pay.

This is not the same as saying you should not allow users to rate things and
share ratings and so on.  Users enjoy doing that.  I just think that the
data is next to useless compared to the alternatives.

> I think the equivalent in CF is a combination of 1)
> an item-based recommender and 2) the log-likelihood similarity metric.

Indeed.  And the lucene based recommender effectively uses (2) twice.  First
in the off-line reduction of data, second in the implicit weighting
performed by lucene.

It is also useful to note that it is a piece of cake to integrate various
search functions into this kind of architecture.  Thus, filtering
recommendations by some boolean constraint, or tainting them with a textual
query or recency preference is literally trivial.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message