mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "j.barrett Strausser" <j.barrett.straus...@gmail.com>
Subject Re: Number of features for ALS
Date Thu, 27 Mar 2014 18:07:13 GMT
Thanks Ted,

Yes for the time problem. We tend to use aggregations of session data. So
instead of asking for user recommendations we do things like user+sessions
recommendations.

Of course, deciding when sessions start and stop isn't trivial. I ideally
what I would want to is time-weight views using a kernel or convolution.
That's a bit heavy so we typically have a global model, that is is
basically all preferences over times. Then these user+session type models.
We can then combine these at another level to give recommendations based on
what you like throughout time versus what you have been doing recently.



-b


On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> For the poly-syllable challenged,
>
> hetereoscedasticity - degree of variation changes.  This is common with
> counts because you expect the standard deviation of count data to be
> proportional to sqrt(n).
>
> time imhogeneity - changes in behavior over time.  One way to handle this
> (roughly) is to first remove variation in personal and item means over time
> (if using ratings) and then to segment user histories into episodes.  By
> including both short and long episodes you get some repair for changes in
> personal preference.  A great example of how this works/breaks is Christmas
> music.  On December 26th, you want to *stop* recommending this music so it
> really pays to limit histories at this point.  By having an episodic user
> session that starts around November and runs to Christmas, you can get good
> recommendations for seasonal songs and not pollute the rest of the
> universe.
>
>
>
> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
> > For my team it has usually been hetereoscedasticity and time
> inhomogeneity.
> >
> >
> >
> >
> > On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
> > <tevfik.aytekin@gmail.com>wrote:
> >
> > > Interesting topic,
> > > Ted, can you give examples of those mathematical assumptions
> > > under-pinning ALS which are violated by the real world?
> > >
> > > On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > > > How can there be any other practical method?  Essentially all of the
> > > > mathematical assumptions under-pinning ALS are violated by the real
> > > world.
> > > >  Why would any mathematical consideration of the number of features
> be
> > > much
> > > > more than heuristic?
> > > >
> > > > That said, you can make an information content argument.  You can
> also
> > > make
> > > > the argument that if you take too many features, it doesn't much hurt
> > so
> > > > you should always take as many as you can compute.
> > > >
> > > >
> > > >
> > > > On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <ssc@apache.org>
> > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> does anyone know of a principled approach of choosing the number of
> > > >> features for ALS (other than cross-validation?)
> > > >>
> > > >> --sebastian
> > > >>
> > >
> >
> >
> >
> > --
> >
> >
> > https://github.com/bearrito
> > @deepbearrito
> >
>



-- 


https://github.com/bearrito
@deepbearrito

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message