mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Inconsistent recommendations
Date Thu, 02 Jul 2009 16:32:05 GMT

Thanks Ted, I got it working now.
I was casting to floats just to make the output easier on the eyes for emailing.
I was missing Math.exp(...) and once I had that in, it all started working.
It then became obvious the final output was not a rank, but a score that dictated the output/ordering.

I even put that in a separate class and called it TedsJitter (implements Jitter, since I am
trying some other jittering approaches).

 Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Ted Dunning <ted.dunning@gmail.com>
> To: mahout-user@lucene.apache.org
> Sent: Tuesday, June 30, 2009 5:35:36 PM
> Subject: Re: Inconsistent recommendations
> 
> Otis,
> 
> There are several substantive problems with your code, mostly due, I am
> sure, to my posting R code which is unfamiliar.  The most important that I
> see off-hand is that the exponential random variable must be defined as:
> 
>          - Math.log(1 - Math.random())
> 
> The idea is that the argument to log must be in the range (0, 1] so that the
> result will be in the range [0, inf).  The 1-Math.random() is that way
> because the range of Math.random() is [0, 1) instead of (0, 1].
> 
> I have a few style beefs that I hope you will take in good humor as well.
> 
> I will make comments about both of these in-line.
> 
> On Tue, Jun 30, 2009 at 10:27 AM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
> 
> >
> >    // exp(-n/5) + rexp() * 0.1
> >    for (int i=1; i < 20; i++) {
> 
> 
> You should use double for all of this code, otherwise your code may be
> considerably slower than desired due to float/double conversions and also
> since we are doing exp of some potentially good sized numbers, it is very
> easy to run out of dynamic range for floats leading to very surprising
> results.  This is essentially a style question, but I find it to be a very
> bad idea to do this kind of premature optimization of floating point
> arithmetic.
> 
>      float exp = (float) i / 5;                                    // not
> > sure why you used -i /n
> 
> 
> -i/n was used because that will lead to doing exp(negative number).  For
> large negative numbers, the slope of exp() becomes very flat which makes
> large rearrangements possible.  Without the negation, the randomization will
> have a very different effect.
> 
>      float rexp = (float) Math.log(i-Math.random());   // tried with 1
> > instead of i like you said, too
> 
> 
> The 1 is critical as mentioned above.
> 
>      float rank = exp + rexp * 0.1f;
> 
> 
> I don't see a call to Math.exp anywhere.  Perhaps it got lost?  That would
> probably explain a large part of the problems.  Also, this is not the rank,
> but rather the synthetic score.  Thus this is a misleading name.
> 
>      float round = Math.round(rank);
> 
> 
> I don't think that you want to round like this.  Instead, what you should be
> doing is accumulating the scores in an array and then sorting the scores.
> What I displayed was a permutation that resulted from sorting.
> 
> 
> >
> >      System.out.println("EXP: " + exp + "\tREXP: " + rexp + "\RANK: " +
> > rank + "\tROUND: " + round);
> >    }
> >


Mime
View raw message