mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation
Date Fri, 26 Nov 2010 18:32:57 GMT
This is because all the ratings are implicitly 1.0 when there are no ratings.

But I actually think this is symptomatic of a problem, since I note
that those recommendations are quite suspiciously in order by item ID.
I am not sure the current state of the distributed recommender is
compatible with boolean data, but I am not an expert here --

Sebastian can we discuss what might be going on here? In the
non-distributed code, items are given a "fake" estimated preferences
which is not actually an estimated preference (because that would
always be 1.0) but some other number that functions as a score --
average similarity to other items for example. This is used as a
ranking and also returned as an "estimated preference" even though
it's not.

Can we do something like that here? or is it already working this way
if certain values / options are set?

On Fri, Nov 26, 2010 at 6:26 PM, Jordi Abad <jordiabad82@gmail.com> wrote:
> Hi,
>
> I'm running a RecommenderJob (mahout-0.4 version) over hadoop like this:
>
> hadoop-0.20 jar /mahout-distribution-0.4/mahout-core-0.4-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input -Dmapred.output.dir=output -s
> SIMILARITY_TANIMOTO_COEFFICIENT -b true
>
> The job works fine but when I examine the result I get things like:
>
> 12    [1:1.0,2:1.0,3:1.0,5:1.0,6:1.0,11:1.0,168:1.0,173:1.0,180:1.0,199:1.0]
> 14    [1:1.0,2:1.0,3:1.0,5:1.0,6:1.0,11:1.0,14:1.0,21:1.0,22:1.0,23:1.0]
> ...
>
> I can't understand why each recommendation gets 1.0 of score. It doesn't
> matter which SimilarityClass I set. I always get a score of 1.0.
>
> My input file is a "boolean file" (1391374 rows) with values like:
>
> 1,6496241
> 1,4368916
> 1,4922226
> 1,4958662
> ...
>
> If I run
> "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob" job
> over the same file I get good results for items.
>
> Any ideas?
>
> Thanks in advance.
>

Mime
View raw message