mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: ALS, weighed vs. non-weighed regularization paper
Date Mon, 16 Jun 2014 22:11:07 GMT
yeah so that was my best guess as well. nothing to do with regularization,
just importance weighing.

The reason i was asking becase i was traditionally including "do WR/ do not
do WR" as a training parameter but wasn't sure if it had much sense. Now i
was revisiting this  for M-1365  again. i guess i will leave it here with
"do WR" by default on.


On Mon, Jun 16, 2014 at 2:27 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> It may actually be that they weren't solving the problem they thought.  By
> regularizing prolific users more vigorously, they may actually have just
> been down-weighting them.
>
> We effectively do the same in ISJ by down-sampling the data.  It is very
> important to do so, but not because of regularization.  The real reason is
> that the most prolific users are soooo prolific and soooo odd.  The reason
> that they appear unhinged is that they are often bots or QA teams.
>  Weighting the behavior of these users highly is a recipe for disaster.
>
>
> On Mon, Jun 16, 2014 at 1:28 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > Yeah I've turned that over in my head. I am not sure I have a great
> > answer. But I interpret the net effect to be that the model prefers
> > simple explanations for active users, at the cost of more error in the
> > approximation. One would rather pick a basis that more naturally
> > explains the data observed in active users. I think I can see that
> > this could be a useful assumption -- these users are less extremely
> > sparse.
> >
> >
> > On Mon, Jun 16, 2014 at 8:50 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > wrote:
> > > Probably a question for Sebastian.
> > >
> > > As we know, the two papers (Hu-Koren-Volynsky and Zhou et. al) use
> > slightly
> > > different loss functions.
> > >
> > > Zhou et al. are fairly unique in that they multiply norm of U, V
> vectors
> > > additionally by the number of observied interactions.
> > >
> > > The paper doesn't explain why it works except saying along the lines of
> > "we
> > > tried several regularization matrices, and this one worked better in
> our
> > > case".
> > >
> > > I tried to figure why that is. And still not sure why it would be
> better.
> > > So b asically we say, by allowing smaller sets of observation having
> > > smaller regularization values, it is ok for smaller observation sets to
> > > overfit slightly more than larger observations sets.
> > >
> > > This seems to be counterintuitive. Intuition tells us, smaller sets
> > > actually would tend to overfit more, not less, and therefore might
> > possibly
> > > use larger regularization rate, not smaller one. Sebastian, what's your
> > > take on weighing regularization in ALS-WR?
> > >
> > > thanks.
> > > -d
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message