mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Inconsistent recommendations
Date Wed, 03 Jun 2009 19:45:18 GMT
If you are using any of the 'samplingRate' parameters, then down in
the code it is using a random number generator to select some subset
of things to look at. That means you could get different results, due
to different neighborhoods, etc. on each request.

Is it bad behavior? Well:

1) If sampling rates aren't too low, the results shouldn't be very
different, even if they are not identical. So one conclusion could be
sampling is having too large an effect and the rate needs to go up

2) The assumption is that any of the slightly different results you
may get are about equally 'good' anyway

3) I suppose I think of computing recommendation as a
relatively-speaking infrequent event. You might compute them once a
day or hour. Or you compute on the fly and cache it, either externally
or in the framework. So, it shouldn't be the case that the same
recommendations are computed over and over in a row, where the
differences might become noticeable, in an application, to a user


Is it possible to guarantee the same recommendation, even when using
sampling, if the data doesn't change? wouldn't be too hard to always
use a local RNG and always seed it the same way, no. It would be a
performance hit.

My first reaction though is #3 -- cache. Is that a feasible response?


Sean



On Wed, Jun 3, 2009 at 8:29 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Hello,
>
> I haven't debugged this yet, but I was playing with sampling rate in Taste and noticed
a weird behaviour where the recommender doesn't give consistent results -- when it gives them
they are always the same, but sometimes it doesn't give them.  For example:
>
>
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'    -- no recommendations
from this call!
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'    -- no recommendations
from this call!
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
>
> Another way to see this is if I use different sampling rates and collect output, like
this:
> $ for x in `seq 1 1000`; do curl --silent 'http://localhost:8080/re/recommend?userID=u4&howMany=10';
done > (output file here)
>
> I get this:
>
> -rw-r--r-- 1 otis otis 5994 2009-06-03 15:24 out-1-sr0.8
> -rw-r--r-- 1 otis otis 5988 2009-06-03 15:24 out-2-sr0.8    -- different outputs!
>
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-1-sr0.9
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-2-sr0.9
>
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-1-sr0.99
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-2-sr0.99
>
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:20 out-1-sr1.0
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:21 out-2-sr1.0
>
> If this worked consistently, the outputs should be identical, no?
>
> This doesn't look normal...bug?
> I'm attaching my sample input (but ML software may strip it).
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>

Mime
View raw message