mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Inconsistent recommendations
Date Wed, 03 Jun 2009 20:00:45 GMT

I see.  I thought sampling rate was only about providing a way to skip some input records
(user, item, preference tuples) to lower memory requirements and increase speed.  I didn't
realize it could affect recommendation computation...

3) is definitely needed, at least in my case, and that's what I do.  Big time. :)
2) is also good to know - if different sets of recommended items all look good (i.e. really
do feel like good recommendations) to users, this adds variety, and I feel that can be a good
thing, at least in my current domain.

So I suppose I simply can't have the sampling rate too low.  Thanks Owen.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Sean Owen <srowen@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 3:45:18 PM
> Subject: Re: Inconsistent recommendations
> 
> If you are using any of the 'samplingRate' parameters, then down in
> the code it is using a random number generator to select some subset
> of things to look at. That means you could get different results, due
> to different neighborhoods, etc. on each request.
> 
> Is it bad behavior? Well:
> 
> 1) If sampling rates aren't too low, the results shouldn't be very
> different, even if they are not identical. So one conclusion could be
> sampling is having too large an effect and the rate needs to go up
> 
> 2) The assumption is that any of the slightly different results you
> may get are about equally 'good' anyway
> 
> 3) I suppose I think of computing recommendation as a
> relatively-speaking infrequent event. You might compute them once a
> day or hour. Or you compute on the fly and cache it, either externally
> or in the framework. So, it shouldn't be the case that the same
> recommendations are computed over and over in a row, where the
> differences might become noticeable, in an application, to a user
> 
> 
> Is it possible to guarantee the same recommendation, even when using
> sampling, if the data doesn't change? wouldn't be too hard to always
> use a local RNG and always seed it the same way, no. It would be a
> performance hit.
> 
> My first reaction though is #3 -- cache. Is that a feasible response?
> 
> 
> Sean
> 
> 
> 
> On Wed, Jun 3, 2009 at 8:29 PM, Otis Gospodnetic
> wrote:
> > Hello,
> >
> > I haven't debugged this yet, but I was playing with sampling rate in Taste and 
> noticed a weird behaviour where the recommender doesn't give consistent results 
> -- when it gives them they are always the same, but sometimes it doesn't give 
> them.  For example:
> >
> >
> > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> > a1
> > a2
> > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> > a1
> > a2
> > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'    -- no 
> recommendations from this call!
> > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> > a1
> > a2
> > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'    -- no 
> recommendations from this call!
> > $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> > a1
> > a2
> >
> > Another way to see this is if I use different sampling rates and collect 
> output, like this:
> > $ for x in `seq 1 1000`; do curl --silent 
> 'http://localhost:8080/re/recommend?userID=u4&howMany=10'; done > (output file

> here)
> >
> > I get this:
> >
> > -rw-r--r-- 1 otis otis 5994 2009-06-03 15:24 out-1-sr0.8
> > -rw-r--r-- 1 otis otis 5988 2009-06-03 15:24 out-2-sr0.8    -- different 
> outputs!
> >
> > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-1-sr0.9
> > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-2-sr0.9
> >
> > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-1-sr0.99
> > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-2-sr0.99
> >
> > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:20 out-1-sr1.0
> > -rw-r--r-- 1 otis otis 6000 2009-06-03 15:21 out-2-sr1.0
> >
> > If this worked consistently, the outputs should be identical, no?
> >
> > This doesn't look normal...bug?
> > I'm attaching my sample input (but ML software may strip it).
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >


Mime
View raw message