mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: slow for RMSE
Date Sat, 21 May 2011 22:58:03 GMT
Wrap you UserSimilarity in a CachingUserSimilarity. I think you're spending
a lot of time re-re-computing similarities.
You don't need a CachingRecommender.

You can use a subset of data for testing by turning down that "1.0"
parameter to something like 0.1.

On Sat, May 21, 2011 at 11:08 PM, gj <gawesh@gmail.com> wrote:

> Hi,
> I've new to mahout. I using mahout-0.3 with Eclipse jdk1.6.0_18 (no
> hadoop).
> I trying to the find RMSE for a dataset. But it seems very slow .. so far I
> have not been able to get the RMSE value for single run. Hence, I was
> wondering if anybody can  look at my setup and tell what I am doing wrong
> or
> why it so slow.
>
> Here's my code:
> public static void main(String[] args) {
> RecommenderBuilder builder = new RecommenderBuilder() {
>  public Recommender buildRecommender(DataModel model) throws
> TasteException{
> UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(model);
>  UserNeighborhood neighborhood = new NearestNUserNeighborhood(5,
> userSimilarity, model);
> Recommender recommender = new GenericUserBasedRecommender(model,
> neighborhood, userSimilarity);
>  return new CachingRecommender(recommender);
> }
> };
>
> RecommenderEvaluator evaluator = new RMSRecommenderEvaluator();
> try {
>  DataModel model = new FileDataModel(new
> File("lf_playhistory_step1_ratings.dat"));
>  double score = evaluator.evaluate(builder,
> null,
> model,
>  0.9,
> 1.0);
>  System.out.println(score);
>  } catch (Exception e) {
> System.err.println("FileNotFoundException: " + e.getMessage());
>  }
> }
> }
>
> Dataset is: 5,462,701 entries of these tuples <userid,track,rating>
> no. of tracks=610,192
> no of users=2330
> ratings = 1 to 5
>
> This is output that I got on console:
>
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Creating FileDataModel for file lf_playhistory_step1_ratings.dat
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Beginning evaluation using 0.9 of
>
> FileDataModel[dataFile:C:\eclipse_workspace\LastFM\lf_playhistory_step1_ratings.dat]
> 21-May-2011 22:26:51 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 21-May-2011 22:28:19 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 1000000 lines
> 21-May-2011 22:29:53 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2000000 lines
> 21-May-2011 22:32:09 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 3000000 lines
> 21-May-2011 22:34:03 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 4000000 lines
> 21-May-2011 22:36:19 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 5000000 lines
> 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 5462701
> 21-May-2011 22:37:08 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 21-May-2011 22:37:16 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 100000
> 21-May-2011 22:37:21 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2330 users
> 21-May-2011 22:37:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 2330 users
> 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Beginning evaluation of 2323 users
> 21-May-2011 22:37:29 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Starting timing of 2323 tasks in 2 threads
> 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Average time per recommendation: 178468ms
> 21-May-2011 22:40:28 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Approximate memory used: 585MB / 840MB
>
> From there on, I just waited for two hours ..and no output.
> The INFO: Average time per recommendation: 178468ms seem very high ....I'm
> guessing it's 178sec X 2330 users = 4.8 days!
> This running on my laptop (Intel Core 2 Duo, T7500 @ 2.2GHz 2 GB RAM)
>
> Why is this taking so long? Is it too big a dataset? Is my laptop too slow?
>
> Can anybody help?
>
> Thanks,
> Gawesh
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message