mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Strange evaluation results for BookCrossingRecommender
Date Wed, 10 Mar 2010 14:58:55 GMT
Ah, good catch. I will adjust that.

I'm happy to make a new example for 'boolean' data, perhaps based on
BookCrossing. It would just ignore the rating data.

On Wed, Mar 10, 2010 at 2:46 PM,  <> wrote:
> I think I found the explanation of the poor result and, maybe, the instability.
> More than 60% of the ratings are 0/10. This is what the publishers of this
> dataset call "implicit rating". It means that the book was read (or purchased)
> but not rated by the user.
> It seems that BookCrossingDataModel is not aware of that and just considered
> them as rating 0. It is therefore not surprising that results are inconsistant.
> An obvious way to solve the problem would be to filter out these implicit
> ratings.
> It would be interesting as well to change all ratings to "0" and to consider all
> of them as implicit. There is so far no mahout examples dedicated to
> recommendation based on binary data (user as bought item or not), even though
> this seems to me like a more common problem than recommendation based on actual
> ratings.
> Selon Sean Owen <>:
>> I see the same variance, but I believe it's due to a small input size.
>> At the moment it's using only 5% of the total input, or about 50,000
>> ratings over 5,000 users. That's fairly small. From there, it's also
>> looking at only 5% of those users to form neighborhoods. These are
>> just too low, and I have increased the amount of data the evaluation
>> uses in a few ways, and get much more stable results.
>> I also switched the algorithm it uses, since the average difference
>> was 4 out of 10, which is pretty poor. I think with more research one
>> could pick the optimal algorithm, but I just picked something that
>> worked a little better (< 3) for now.
>> On Tue, Mar 9, 2010 at 6:30 PM, Sean Owen <> wrote:
>> > I see, that definitely doesn't sound right. Let me run it myself
>> > tonight when I am home and see what I observe.
>> >
>> > On Tue, Mar 9, 2010 at 5:40 PM,  <> wrote:
>> >> I did not change anything from the example provided in mahout-example,
>> >> development version. It uses 5% for evaluation, which is 5000 instances.
>> With
>> >> such test set size, the range should not be that big. I suspect that there
>> is
>> >> something wrong somewhere.
>> >

View raw message