mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Rewig <>
Subject Re: Memory and Speed Questions for Item-Based-Recommender
Date Fri, 10 Jul 2009 13:45:55 GMT

> Yes all that is true. Precomputing is reasonable -- it's the storing
> it in memory that is difficult given the size. You could consider
> keeping the similarities in the database instead, and not loading into
> memory, if you are worried about memory. There is not an
> implementation that reads a database table but we could construct one.
:-) Sounds good. Thanks, but I will first test your other suggestions 
and hints.
> I don't see how UserSimilarity objects come into this. You would not
> use one in an item-based recommender. There is a CachingItemSimilarity
> for ItemSimilarity classes.
Because I precompute the Item-Similarity-Matrix with a 
UserBasedSimilarity and that DB-Table:
|| aItem || aItemCharacteristic || aItemValue ||  ... so

    * User = aItem
    * Item = aItemCharacteristic
    * Preference = aItemValue

I uses that method to get the correlation:

        aCorrelation = aUserSimilarity.userSimilarity(user1, user2);

This is in my example the similarity between aItem1 and aItem2

If I use a CachingItemSimilarity I must use a ItemSimilarity:

        aCorrelation = aItemSimilarity.itemSimilarity(item1, item2);

This is in my example and my opinion the similarity between 
aItemCharacteristic1 and aItemCharacteristic2 and this isn't interesting 
for me.
So I must use the UserSimilarity objects and the UserBasedRecommender 
although I would prefer the ItemBasedRecommender.

... hopefully my line of reasoning is not to confused ;-).
> What you are doing now is effectively pre-computing all similarities
> and caching them in memory, all of them, ahead of time. Using
> CachingItemSimilarity would simply do that for you, and would probably
> use a lot less memory since only pairs that are needed, and accessed
> frequently, will be put into memory. It won't be quite as fast, since
> it will still be re-computing similarities from time to time. But
> overall you will probably use far less memory for a small decrease in
> performance.
Ok I will try this at first.
> Beyond that I could suggest more extreme modifications to the code.
> For example, if you are willing to dig into the code to experiment,
> you can try something like this: instead of considering every single
> item for recommendation every time, pre-compute some subset of items
> that are reasonably popular, and then in the code, only consider
> recommending these. It is not a great approach since you want to
> recommend obscure items sometimes, but could help.
I will think about this, but in the moment I will try to recommend all 
Items. If it isn't fast enough and there is no other idea I will try this.
> You should also try using the very latest code from subversion. Just
> this week I have made some pretty good improvements to the JDBC code.
> Also, it sounds like you are trying to do real-time recommendations,
> like synchronously with a user request. This can be hard since it
> imposes such a tight time limit. Consider doing recommendations
> asynchronously if you can. For example, start computing
> recommendations when the user logs in, and maybe on the 2nd page view
> 5 seconds later, you are ready to recommend something.
Yes real-time is the dream :-) but I know this will be hard to reach. I 
first will follow your hints and if the worst-case recommendation will 
no longer be 80s I'm happy :-).

best regards

Thomas Rewig

View raw message