mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Rewig <>
Subject Memory and Speed Questions for Item-Based-Recommender
Date Fri, 10 Jul 2009 09:03:52 GMT
Hello Taste-Community,

since a few weeks I tested with mahout-taste (Release Apache Mahout 0.1) 
- and I like it :-)!

I have created a working Item-Based-Recommender and now I have some 
questions about speed and memory
... maybe you can give me a hint what I have to improve.

   1. ItemCorrelation: I precompute all correlations for approximately
      100000 items and save them in a MySqlDataBase if they correlate
      more than e.g. 0.95 . Then I get the correlation in the
      recommender in that way:

          //use the _precomputed_ ItemItemCorrelation

          String[] splittArray = *null*;
          String strLine = *null*;

          ItemItemSimilarity aItemItemCorrelation = *null*;
          correlationMatrix =
          *new* ArrayList<GenericItemSimilarity.ItemItemSimilarity>();

          // open File:

          BufferedReader inStream = *new* BufferedReader(*new*

          *while*((strLine = inStream.readLine()) != *null*)

              splittArray = strLine.split(",");

              Item aItem1 = *new* GenericItem<String>(splittArray[0]);
              Item aItem2 = *new* GenericItem<String>(splittArray[1]);

              aItemItemCorrelation = *new
              *GenericItemSimilarity.ItemItemSimilarity(aItem1, aItem2,
              Double./parseDouble/(splittArray[2]) );

          // set the ItemSimilarity:

          * **this*.itemSimilarity = *new*
          // set Recommender:

          recommender = *new*
          GenericItemBasedRecommender(*super*.getModel(), itemSimilarity);
          // set CachingRecommender:
          * this*.cachingRecommender = *new*

      Question 1:
      The similarity-matrix uses 400MB memory at the MySQLDB - by
      setting the ItemCorrelation 8GB Ram will be used to load the
      similarity-matrix as a GenericItemSimilarity. Is it
      possible/plausible that this matix uses more than 20 times more
      memory in RAM then in the Database - or have I do something wrong ?

      Question 2:
      How can I reduce the memory consumption from the
      GenericItemSimilarity? - |*GenericItemSimilarity
      int maxToKeep)|
      does't work, because if maxToKeep is too small, the
      recommendations will be bad ...

   2. Speed of Recommendation: I use a MySQLJDBCDataModel - MyISAM.
      Primary Key and Indexes are set:
      PRIMARY KEY (user_id, item_id),INDEX (user_id),INDEX (item_id). A
      Recommendation for a User takes between 0,5 and 80 seconds - I
      would like if this takes just 300ms.

    By the way I use a Quadcore 3,2 GHz with 32G-RAM to compute the
    recommendations, so maybe the DB is the Bottleneck. But if I use a
    FileDataModel it is faster, but not really much.

    Heres a log for a User with 2000 belonging Items:

    INFO  CollaborativeModel - Seconds to set ItemCorrelation: 76.187506 s
    INFO  CollaborativeModel - Seconds to set Recommender:
    0.025945000000000003 s
    INFO  CollaborativeModel - Seconds to set CachingRecommender: 0.06511 s
    INFO  CollaborativeController - SECONDS TO REFRESH THE SYSTEM:
    6.450000000000001E-4 s

    Is there a way to increase the speed of a recommendation? (use
    InnoDB?, compute less Items ... someway ;-)...?)

So if you have some idea how I could reduce the memory consumption and 
increase the recommendation speed I would be very thankfully.

best regards

Thomas Rewig

View raw message