mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Item Based Recommendation Evaluation based on Number of Preferences
Date Wed, 04 Jan 2012 04:56:28 GMT
That is the opposite of what you'd expect, and I think that's a possible
explanation you've identified, but still seems unlikely to me. Something
else may be wrong. Is this repeatable, and not just a fluke of the random
number generator? What are the exact args you're using, just to make sure
you're really setting the percentages and such as you think?

If you have more data available, indeed I'd use more data, especially if
that more accurately reflects your real environment. You can try to exclude
these low-rank items, though this makes the test less representative of
reality, since those kinds of item do exist and are an issue. What
ItemSimilarity? because some are by nature already accounting for these
issues, like log-likelihood.

But you can use IDRescorer if you like to exclude such items, if you do
want to go that way, yes.

On Wed, Jan 4, 2012 at 1:51 AM, Nick Jordan <nick@influen.se> wrote:

> Hi All,
>
> I'm currently running an item based recommendation
> using KnnItemBasedRecommender.  My data set isn't very large at
> approximately 30k preferences over 10k items.  When running
> a AverageAbsoluteDifferenceRecommenderEvaluator evaluation on a 0.9
> training set the result is ~0.80 (on a preference scale of 1-5).  When
> tuning that training set down to only 0.1 the mean difference is closer to
> 0.2.
>
> I assume that this number is actually lower because there are less
> recommendations that can actually be made.  Meaning that with the smaller
> training set there isn't enough similarity to make recommendations, and so
> those that it does make are more accurate.  So the question for me becomes,
> what does the evaluation look like when only providing recommendations for
> items with more than x declared preferences?  I'm wondering what the best
> way to determine this.  Should I create a new recommender that only will
> return items with x or more preferences (maybe using IDRescorer?) or should
> I create a new evaulator to do something similar?  Is there a native method
> to accomplish this that I've missed?  Is my hypothesis just likely wrong?
>
> Appreciate the feedback.
>
> Nick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message