mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Memory and Speed Questions for Item-Based-Recommender
Date Fri, 10 Jul 2009 09:30:44 GMT
On Fri, Jul 10, 2009 at 10:03 AM, Thomas Rewig<trewig@mufin.com> wrote:
>     Question 1:
>     The similarity-matrix uses 400MB memory at the MySQLDB - by
>     setting the ItemCorrelation 8GB Ram will be used to load the
>     similarity-matrix as a GenericItemSimilarity. Is it
>     possible/plausible that this matix uses more than 20 times more
>     memory in RAM then in the Database - or have I do something wrong ?

I could believe this. 100,000 items means about 5,000,000,000
item-item pairs are possible. Many are not kept, but seeing as each
once requires 30 or so bytes of memory, I am not surprised that it
could take 8GB.

That's really a lot to keep in memory. I might suggest, instead, that
you not pre-compute the similarities, but instead compute them as
needed and cache (use CachingItemSimilarity). That way you are not
spending so much memory on pairs that may never get used, but still
get much of the speed improvement.


>     Question 2:
>     How can I reduce the memory consumption from the
>     GenericItemSimilarity? - |*GenericItemSimilarity
>
> <http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.html#GenericItemSimilarity%28java.lang.Iterable,%20int%29>*(Iterable
>
> <http://java.sun.com/javase/6/docs/api/java/lang/Iterable.html?is-external=true><GenericItemSimilarity.ItemItemSimilarity
>
> <http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/GenericItemSimilarity.ItemItemSimilarity.html>>
> similarities,
>     int maxToKeep)|
>     does't work, because if maxToKeep is too small, the
>     recommendations will be bad ...

Yeah you are already filtering out many of the less important
correlations anyway. You could filter yet more, to reduce memory
requirements, but I think it's just best to not try to store all of
this in memory. It doesn't scale well.


>  2. Speed of Recommendation: I use a MySQLJDBCDataModel - MyISAM.
>     Primary Key and Indexes are set:
>     PRIMARY KEY (user_id, item_id),INDEX (user_id),INDEX (item_id). A
>     Recommendation for a User takes between 0,5 and 80 seconds - I
>     would like if this takes just 300ms.
>
>   By the way I use a Quadcore 3,2 GHz with 32G-RAM to compute the
>   recommendations, so maybe the DB is the Bottleneck. But if I use a
>   FileDataModel it is faster, but not really much.
>
>   Heres a log for a User with 2000 belonging Items:
>
>   INFO  CollaborativeModel - Seconds to set ItemCorrelation: 76.187506 s
>   INFO  CollaborativeModel - Seconds to set Recommender:
>   0.025945000000000003 s
>   INFO  CollaborativeModel - Seconds to set CachingRecommender: 0.06511 s
>   INFO  CollaborativeController - SECONDS TO REFRESH THE SYSTEM:
>   6.450000000000001E-4 s
>   INFO  root - SECONDS TO GET A RECOMMENDATION FOR USER: 50.888347 s
>
>   Question:
>   Is there a way to increase the speed of a recommendation? (use
>   InnoDB?, compute less Items ... someway ;-)...?)

Your indexes are right. Are you using a connection pool? that is
really important.

How many users do you have? if you have relatively few users, you
might use a user-based recommender instead. Or, consider a slope-one
recommender.
It sounds like you have a lot of items, so the way item-based
recommenders work, it will be slow.

Using CachingItemSimilarity could help. I am surprised that a
FileDataModel isn't much faster, since it loads data in memory. That
suggests to me that the database isn't the bottleneck.

Are you using multiple threads to compute recommendations
simultaneously? you certainly can, to take advantage of the 4 cores.

Mime
View raw message