mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg H <gre...@gmail.com>
Subject Re: ItemSimilarityJob's results differ from non-distributed version
Date Fri, 25 Nov 2011 08:27:28 GMT
Hi Sebastian,

I converted the dataset by simply keeping all user/item pairs that had a
rating of above 3. I'm also using GenericItemBasedRecommender's
mostSimilarItems method instead of the recommend method to make
recommendations.

I'm certainly open to suggestions on better evaluation metrics. I'm just
using the top 5 because it was easy to implement.

Thanks,
Greg

On Fri, Nov 25, 2011 at 4:03 PM, Sebastian Schelter <ssc@apache.org> wrote:

> Hi Greg,
>
> You should get the same results, can you describe exactly how you
> converted the dataset? I'd like to try this myself, maybe you found some
> subtle bug.
>
> I also have doubts whether taking the precision of the top 5 recommended
> items is really a good quality measure.
>
> --sebastian
>
> On 25.11.2011 02:41, Greg H wrote:
> > Thanks for the replies Sebastian and Sean. I looked at the similarity
> > values and they are the same, but ItemSimilarityJob is calculating fewer
> of
> > them. So it must be still doing some sort of sampling. I thought that I
> > could force it to use all of the data by setting maxPrefsPerUser
> > sufficiently large. Could there be another reason for it not to calculate
> > all of the similarity values?
> >
> > I also tried to use a smaller amount of similarItemsPerItem but this
> leads
> > to worse results.
> >
> > Thanks again,
> > Greg
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message