mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Blechschmidt <Manuel.Blechschm...@gmx.de>
Subject Re: Mahout performance issues
Date Fri, 02 Dec 2011 11:20:09 GMT
Hello Daniel,

On 02.12.2011, at 12:02, Daniel Zohar wrote:

> Hi guys,
> 
> ...
> I just ran the fix I proposed earlier and I got great results! The query
> time was reduced to about a third for the 'heavy users'. Before it was 1-5
> secs and now it's 0.5-1.5. The best part is that the accuracy level should
> remain exactly the same. I also believe it should reduce memory
> consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets
> significantly smaller (in my case at least).

It would be great if you could measure your run time performance and your accuracy with the
provided Mahout tools.

In your case because you only have boolean feedback precision and recall would make sense.

https://cwiki.apache.org/MAHOUT/recommender-documentation.html

RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, 3,
      RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);


Here is some example code from me:

public void testEvaluateRecommender() {
		try {
			DataModel myModel = new MyModelImplementationDataModel();
			
			// Users: 12858
			// Items: 5467
			// MaxPreference: 85850.0
			// MinPreference: 50.0
			System.out.println("Users: "+myModel.getNumUsers());
			System.out.println("Items: "+myModel.getNumItems());
			System.out.println("MaxPreference: "+myModel.getMaxPreference());
			System.out.println("MinPreference: "+myModel.getMinPreference());

			RecommenderBuilder randomBased = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new RandomRecommender(model);
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			RecommenderBuilder genericItemBased = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new GenericItemBasedRecommender(model,
								new PearsonCorrelationSimilarity(model));
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			RecommenderBuilder genericItemBasedCosine = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new GenericItemBasedRecommender(model,
								new UncenteredCosineSimilarity(model));
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			RecommenderBuilder genericItemBasedLikely = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					return new GenericItemBasedRecommender(model,
							new LogLikelihoodSimilarity(model));
				}
			};

			
			RecommenderBuilder genericUserBasedNN3 = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new GenericUserBasedRecommender(
								model,
								new NearestNUserNeighborhood(
										3,
										new PearsonCorrelationSimilarity(model),
										model),
								new PearsonCorrelationSimilarity(model));
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			RecommenderBuilder genericUserBasedNN20 = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new GenericUserBasedRecommender(
								model,
								new NearestNUserNeighborhood(
										20,
										new PearsonCorrelationSimilarity(model),
										model),
								new PearsonCorrelationSimilarity(model));
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			RecommenderBuilder slopeOneBased = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new SlopeOneRecommender(model);
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			RecommenderBuilder svdBased = new RecommenderBuilder() {
				public Recommender buildRecommender(DataModel model) {
					// build and return the Recommender to evaluate here
					try {
						return new SVDRecommender(model, new ALSWRFactorizer(
								model, 100, 0.3, 5));
					} catch (TasteException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
						return null;
					}
				}
			};

			// Data Set Summary:
			// 12858 users
			// 121304 preferences

			RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();

			double evaluation = evaluator.evaluate(randomBased, null, myModel,
					0.9, 1.0);
			// Evaluation of randomBased (baseline): 43045.380570443434
			// (RandomRecommender(model))
			System.out.println("Evaluation of randomBased (baseline): "
					+ evaluation);

			// evaluation = evaluator.evaluate(genericItemBased, null, myModel,
			// 0.9, 1.0);
			// Evaluation of ItemBased with Pearson Correlation:
			// 315.5804958647985 (GenericItemBasedRecommender(model,
			// PearsonCorrelationSimilarity(model))
			// System.out
			// .println("Evaluation of ItemBased with Pearson Correlation: "
			// + evaluation);

			// evaluation = evaluator.evaluate(genericItemBasedCosine, null,
			// myModel, 0.9, 1.0);
			// Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
			// (GenericItemBasedRecommender(model,
			// UncenteredCosineSimilarity(model)))
			// System.out
			// .println("Evaluation of ItemBased with Uncentered Cosine: "
			// + evaluation);
			
			evaluation = evaluator.evaluate(genericItemBasedLikely, null,
					myModel, 0.9, 1.0);
			// Evaluation of ItemBase with log likelihood: 176.45243607278724
			// (GenericItemBasedRecommender(model,
			// LogLikelihoodSimilarity(model)))
			System.out
					.println("Evaluation of ItemBased with LogLikelihood: "
							+ evaluation);
			
			

			// User based is slow and inaccurate
			// evaluation = evaluator.evaluate(genericUserBasedNN3, null,
			// myModel, 0.9, 1.0);
			// Evaluation of UserBased 3 with Pearson Correlation:
			// 1774.9897130330407 (GenericUserBasedRecommender(model,
			// NearestNUserNeighborhood(3, PearsonCorrelationSimilarity(model),
			// model), PearsonCorrelationSimilarity(model)))
			// took about 2 minutes
			// System.out.println("Evaluation of UserBased 3 with Pearson Correlation: "+evaluation);

			// evaluation = evaluator.evaluate(genericUserBasedNN20, null,
			// myModel, 0.9, 1.0);
			// Evaluation of UserBased 20 with Pearson
			// Correlation:1329.137324225053 (GenericUserBasedRecommender(model,
			// NearestNUserNeighborhood(20, PearsonCorrelationSimilarity(model),
			// model), PearsonCorrelationSimilarity(model)))
			// took about 3 minutes
			// System.out.println("Evaluation of UserBased 20 with Pearson Correlation: "+evaluation);

			// evaluation = evaluator.evaluate(slopeOneBased, null, myModel,
			// 0.9, 1.0);
			// Evaluation of SlopeOne: 464.8989330869532
			// (SlopeOneRecommender(model))
			// System.out.println("Evaluation of SlopeOne: "+evaluation);

			// evaluation = evaluator.evaluate(svdBased, null, myModel, 0.9,
			// 1.0);
			// Evaluation of SVD based: 378.9776153202042
			// (ALSWRFactorizer(model, 100, 0.3, 5))
			// took about 10 minutes to calculate on a Mac Book Pro
			// System.out.println("Evaluation of SVD based: "+evaluation);

		} catch (TasteException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

	}

> 
> The fix is merely adding two lines of code to one of
> the GenericBooleanPrefDataModel constructors. See
> http://pastebin.com/K5PB68Et, the lines I added are #11, #22.
> 
> The only problem I see at the moment, is that the similarities
> implementations are using the num of users per item in the
> item-item similarity calculation. This _can_ be mitigated by creating an
> additional Map in the DataModel which maps itemID to numUsers.
> 
> What do you think about the proposed solution? Perhaps I am missing some
> other implications?
> 
> Thanks!
> 
> 
> On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <srowen@gmail.com> wrote:
> 
>> (Agree, and the sampling happens at the user level now -- so if you sample
>> one of these users, it slows down a lot. The spirit of the proposed change
>> is to make sampling more fine-grained, at the individual item level. That
>> seems to certainly fix this.)
>> 
>> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> 
>>> This may or may not help much.  My guess is that the improvement will be
>>> very modest.
>>> 
>>> The most serious problem is going to be recommendations for anybody who
>> has
>>> rated one of these excessively popular items.  That item will bring in a
>>> huge number of other users and thus a huge number of items to consider.
>> If
>>> you down-sample ratings of the prolific users and kill super-common
>> items,
>>> I think you will see much more improvement than simply eliminating the
>>> singleton users.
>>> 
>>> The basic issue is that cooccurrence based algorithms have run-time
>>> proportional to O(n_max^2) where n_max is the maximum number of items per
>>> user.
>>> 
>>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <dissoman@gmail.com> wrote:
>>> 
>>>> This is why I'm looking now into improving GenericBooleanPrefDataModel
>> to
>>>> not take into account users which made one interaction under the
>>>> 'preferenceForItems' Map. What do you think about this approach?
>>>> 
>>> 
>> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Mime
View raw message