mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: sampling bestseller buyers for recommendations
Date Mon, 26 Dec 2011 22:25:00 GMT
Log-likelihood is very much like PMI (but better).

This is a general recommendation problem, but should not be a problem after
using the log-likelihood ratio.  It is easy to show that any item that
cooccurs with everything will have zero score with LLR.

It may also be possible that these common items are prevalent in distinct
sub-populations.  In that case, you may actually have some strong signal
there.  In that case, down-sampling common items and downsampling prolific
consumers is very much a good idea.

Downsampling is better in most cases than reweighting because it has pretty
much the same effect but makes things run much faster as well.  You might
as well get both benefits at once.

On Mon, Dec 26, 2011 at 2:20 PM, Valentin Pletzer <pletzer@gmail.com> wrote:

> I am already using Log-likelihood. But since the items are free downloads
> some items tend to cooccur very often with nearly every other item. So
> maybe my problem isnt a mahout problem but a more generell recommendation
> problem?
>
> I am thinking about some dampening factor for very popular items or
> something similar to PMI (
> http://en.wikipedia.org/wiki/Pointwise_mutual_information)
>
> On Mon, Dec 26, 2011 at 11:07 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > What item similarity metric are you using? Log-likelihood tends to
> > account for an item's baseline popularity and normalize it away. So a
> > best-seller isn't similar to an item just because it's a best-seller
> > and shows up a lot, but because it shows up an unusually large number
> > of times, even granting it's a best seller. Try that if you're not
> > already using it.
> >
> > On Mon, Dec 26, 2011 at 4:01 PM, Valentin Pletzer <pletzer@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > I am trying to achieve some item-to-item-recommendations and the setup
> > > works quite well. But one thing I stumbled across is that some items
> are
> > so
> > > popular that they are a recommendation for nearly every other item. In
> > the
> > > Amazon paper they say that they are sampling the bestseller buying
> > > customers. Do I have to do this preprocessing step myself or does
> Mahout
> > > help with that?
> > >
> > > Thanks
> > > Valentin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message