mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Memory and Speed Questions for Item-Based-Recommender
Date Mon, 13 Jul 2009 16:21:06 GMT
Nice, well that is pretty much the definition of "item-based
collaborative filtering"! It would be interesting to see how it scales
indeed. This doesn't include a notion of item ratings (well, maybe the
"documents" can include the item tokens several times to indicate a
stronger association) but that is not a necessary condition for good
recommendations. I think the equivalent in CF is a combination of 1)
an item-based recommender and 2) the log-likelihood similarity metric.

On Mon, Jul 13, 2009 at 4:11 PM, Ted Dunning<> wrote:
> Also, Lucene automagically does weighting which is very, very similar to
> exactly what you want.
> To Sean's question, the trick is that Lucene can store a list of item-item
> links that were filtered by cooccurrence statistics to form a binary matrix
> of interesting links.  Then if you query with a user's recent history of
> items as a query, you get back a list of items formed by considering
> different items to be weighted according to rarity.
> The result is quite good, very fast.  The reasons are that Lucene *is*
> weighted matrix multiplication of just the right sort.  This is what I was
> going to talk about in detail at ApacheCon.

View raw message