mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Mahout Binary Recommender Evaluation
Date Wed, 27 Jul 2011 07:43:17 GMT
(This just posted to the list, but I believe it's a duplicate of a
message from several days ago. See my previous response.)

On Wed, Jul 27, 2011 at 8:33 AM, MT <> wrote:
> I'm working on a common dataset that includes the user id, item id, and
> timestamp (the moment the user bought the item). As there are no
> preferences, I needed a binary item-based recommender, which I found in
> Mahout (GenericBooleanPrefItemBasedRecommender and the Tanimoto
> coefficient). Following the recommender documentation, I tried to evaluate
> it with GenericRecommenderIRStatsEvaluator(), but I ran into a few problems.
> In fact, correct me if I'm wrong, but to me the evaluator will invariably
> give us the same value for precision and recall. Since the items are all
> rated with the binary 1.0 value, we give the recommender a threshold lower
> than 1, thus for each user at items are considered relevant and removed from
> the user's preferences to compute at recommendations. Precision and recall
> are then computed with the two sets : relevant and retrieved items. Which
> leads (I guess unless the recommender cannot compute at items) to precision
> and recall being equal.
> Results are still useful though, since a value of 0.2 for precision tells us
> that among the at recommended items, 20% were effectively bought by the
> user. Although one can wonder if those items are the best recommendations,
> the least we can say is that it somehow corresponds to the user's
> preferences.
> However, I had a few ideas to give more meaning to precision and recall taht
> I wanted to share, to get some advice before implementing them.
> I read this topic and I fully understand that IRStatsEvaluator is different
> from classic evaluators (giving the MAE for example), but I feel that it
> makes sense to have a parameter trainingPercentage that divides users'
> preferences in two subsets of items. The first (typically 20%) are
> considered as relevant items, which are to be predicted using the second
> subset. This task is at the moment defined by at, resulting in often equal
> numbers of items in the relevant and retrieved subset. This at value would
> still be a parameter used to define the number of items retrieved. The
> evaluator could then be run varying these two parameters to find the best
> compromise between precision and recall.
> Furthermore, should the dataset contain a timestamp for each purchase, would
> it not be logic to set the test set as the last items bought by the user ?
> The evaluator would then follow what happens in real calculations.
> Finaly, I believe the documentation page has some mistakes in the last code
> excerpt :
> evaluator.evaluate(builder, myModel, null, 3,
> RecommenderIRStatusEvaluator.CHOOSE_THRESHOLD,
>        &sect;1.0);
> should be
> evaluator.evaluate(builder, null, myModel, null, 3,
> GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
> Thanks for your help !
> --
> View this message in context:
> Sent from the Mahout User List mailing list archive at

View raw message