mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation
Date Sun, 28 Nov 2010 10:37:24 GMT
Pearson-Correlation and boolean data don't fit, all cooccurring ratings
will have value 1 and therefore no correlation can be computed as the
compared vectors are identical.

--sebastian

Am 28.11.2010 11:28, schrieb Jordi Abad:
> Hi,
>
> I applied the changes of MAHOUT-553 (thanks Sebastian!) against
> mahout-0.4. Everything makes sense now. I've tried it with different
> similarities (SIMILARITY_LOGLIKELIHOOD,
> SIMILARITY_TANIMOTO_COEFFICIENT, SIMILARITY_UNCENTERED_COSINE) and it
> works fine (i.e. I got good recommendations with different scores) but
> when I tried SIMILARITY_PEARSON_CORRELATION, I got an empty part-00000
> file. Is it normal?
>
> On Fri, Nov 26, 2010 at 7:50 PM, Sean Owen <srowen@gmail.com
> <mailto:srowen@gmail.com>> wrote:
>
>     The behavior difference is fairly simple. Instead of a weighted
>     average of preferences (which will always equal 1.0), compute some
>     other function of those weights -- for example, the average of the
>     weights.
>
>     See GenericBooleanPrefItemBasedRecommender. It's actually just summing
>     the weights. This is nearly the same thing since the number of items
>     participating in the average is the same for all estimates. *Nearly*
>     the same since some can be NaN.
>
>     It's an open question whether there aren't better functions of the
>     weights to use, but this is a fine start, IMHO.
>
>
>     On Fri, Nov 26, 2010 at 6:45 PM, Sebastian Schelter
>     <ssc@apache.org <mailto:ssc@apache.org>> wrote:
>     > Hi Sean,
>     >
>     > the prediction computation for boolean data is done in
>     > AggregateAndRecommendReducer.reduceBooleanData()
>     >
>     > It computes *all* possible items to recommend for the current
>     user and
>     > writes out only the n first after that, with n being the number
>     > specified in the parameter --numRecommendations given to
>     RecommenderJob.
>     >
>     > Can you point me to the code where the non-distributed code
>     handles the
>     > problem of ranking them? We could certainly emulate that
>     behaviour in
>     > the distributed code too.
>     >
>     > --sebastian
>     >
>     >
>     >
>     > Am 26.11.2010 19:35, schrieb Sean Owen:
>     >> But is it then ranking the recommendations by the estimated
>     pref? If
>     >> it's always 1, then the ordering is not meaningful.
>     >>
>     >> Maybe it is, I just haven't looked at your changes in much detail
>     >> since you made them although it looked broadly correct and proper.
>     >>
>     >> On Fri, Nov 26, 2010 at 6:33 PM, Sebastian Schelter
>     <ssc@apache.org <mailto:ssc@apache.org>> wrote:
>     >>
>     >>> If all ratings have value 1 (cause we use boolean data) the
>     result of
>     >>> the Predicition can also only be 1.
>     >>>
>     >
>     >
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message