mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zia mel <ziad.kame...@gmail.com>
Subject Re: Choosing precision
Date Tue, 15 Jan 2013 18:03:29 GMT
Amazing answer !

What about these measures that appear a lot when evaluating ? Are they
implemented in Mahout?
Mean Average Precision (MAP)
Mean Reciprocal Rank (MRR)


Since we have RMS , would this code give a correct answer for MAP

//** RMS code
public final class RMSRecommenderEvaluator extends
AbstractDifferenceRecommenderEvaluator {
  protected void processOneEstimate(float estimatedPreference,
Preference realPref) {
    double diff = realPref.getValue() - estimatedPreference;
    average.addDatum(diff * diff);
  }

  @Override
  protected double computeFinalEvaluation() {
    return Math.sqrt(average.getAverage());
  }


//**MAP code
  protected void processOneEstimate(float estimatedPreference,
Preference realPref) {
    double diff = realPref.getValue() - estimatedPreference;
    average.addDatum(diff );
  }

  @Override
  protected double computeFinalEvaluation() {
    return average.getAverage();
  }

Have a nice day Sean :)

On Tue, Jan 15, 2013 at 11:17 AM, Sean Owen <srowen@gmail.com> wrote:
> The best tests are really from real users. A/B test different
> recommenders and see which has better performance. That's not quite
> practical though.
>
> The problem is that you don't even know what the best recommendations
> are. Splitting the data by date is reasonable, but recent items aren't
> necessarily most-liked. Splitting by rating is more reasonable on this
> point, but you still can't conclude that there aren't better
> recommendations from among the un-rated items.
>
> Still it out to correlate. I think you will find precision/recall are
> very low in most cases, often a few percent. The result is "noisy".
> AUC will tell you about where all of those "best recommendations" in
> the test set fell into the list, rather than only measuring the top
> N's performance. This tells you more, and I think that's generally
> good. However it is measuring performance over the entire list of
> recs, when you are unlikely to use more than the top N.
>
> Go ahead and use it since there's not a lot better you can do in the
> lab, but be aware of the issues.

Mime
View raw message