mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Memory and Speed Questions for Item-Based-Recommender
Date Fri, 10 Jul 2009 12:34:47 GMT
On Fri, Jul 10, 2009 at 1:18 PM, Thomas Rewig<> wrote:
> Ok I will test with the Casching(?)Similarity. If I understand you right,
> this will mean I
>   * create a DataModel_1 (MySQLDB) in this Way: aItem,
>     aItemCharacteristic, aItemValue (each aItem have 40
>     aItemCharacteristics)
>   * create a UserSimilarity so that I have the similarity of the
>     aItems (if I use ItemSimilarity I would get the similarity of the
>     aItemCharacteristic ... right?)
>   * create a CachingUserSimilarity and put DataModel_1 and the
>     UserSimilarity in there
>   * create a DataModel_2 (MySQLDB) in this Way:
>     aUser,aItem,aItemPreference
>   * create the Neighborhood
>   * create a UserBasedRecommender and put the Neighborhood, the
>     DataModel_2 and the CachingUserSimilarity in there
>   * create a CachingRecommender
>   * et voilà :-) I have a working memory sparing recommender
> But I can't do that with a Itembased-Recommender because I have no
> ItemCorrelation (because theSimilarity of aItemCharacteristic doesn't matter
> ), is that right? So the sentence in the docu: "So, item-based recommenders
> can use pre-computed similarity values in the computations, which make them
> much faster. For large data sets, item-based recommenders are more
> appropriate" doesn't work for me. Or

Yes all that is true. Precomputing is reasonable -- it's the storing
it in memory that is difficult given the size. You could consider
keeping the similarities in the database instead, and not loading into
memory, if you are worried about memory. There is not an
implementation that reads a database table but we could construct one.

I don't see how UserSimilarity objects come into this. You would not
use one in an item-based recommender. There is a CachingItemSimilarity
for ItemSimilarity classes.

What you are doing now is effectively pre-computing all similarities
and caching them in memory, all of them, ahead of time. Using
CachingItemSimilarity would simply do that for you, and would probably
use a lot less memory since only pairs that are needed, and accessed
frequently, will be put into memory. It won't be quite as fast, since
it will still be re-computing similarities from time to time. But
overall you will probably use far less memory for a small decrease in

Beyond that I could suggest more extreme modifications to the code.
For example, if you are willing to dig into the code to experiment,
you can try something like this: instead of considering every single
item for recommendation every time, pre-compute some subset of items
that are reasonably popular, and then in the code, only consider
recommending these. It is not a great approach since you want to
recommend obscure items sometimes, but could help.

You should also try using the very latest code from subversion. Just
this week I have made some pretty good improvements to the JDBC code.

Also, it sounds like you are trying to do real-time recommendations,
like synchronously with a user request. This can be hard since it
imposes such a tight time limit. Consider doing recommendations
asynchronously if you can. For example, start computing
recommendations when the user logs in, and maybe on the 2nd page view
5 seconds later, you are ready to recommend something.

> Yes I do, but every .recommend command is taste intern only a single thread.
> Is that right?

Yes internally there is no multi-threading. You would do it externally.

View raw message