Mailing-List: contact user-help@mahout.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@mahout.apache.org
Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates
 209.85.214.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <DUB119-W13517A3852965F09E16167B88A0@phx.gbl>
References: <DUB119-W166228D44F7940A8556C7AB88A0@phx.gbl>
	<CADHDM+aOO8Wjh7u3qwSjzLdk1pndx9Ou5hnVrAC3_56zF2=h6w@mail.gmail.com>
	<DUB119-W38A391C6E6341FB0F72F68B88A0@phx.gbl>
	<CAJwFCa2s8pW2LbptSRdvtx9cn9BkYJ25aw90pER9onM12yP0mA@mail.gmail.com>
	<DUB119-W27A006A1BE37122CC4CC3FB88A0@phx.gbl>
	<CAPud8TquXTuyBMSJWbe-meodCykw2FxucX5BrUDvUPPkz+yKEQ@mail.gmail.com>
	<DUB119-W13517A3852965F09E16167B88A0@phx.gbl>
Date: Mon, 24 Jun 2013 14:01:20 -0700
Message-ID: 
 <CAPud8TqFY=kCXM_4d=PO6ou4nNQamJWXCngpX=bHLgh0OU6Tdw@mail.gmail.com>
Subject: Re: Consistent repeatable results for distributed ALS-WR recommender
From: Dmitriy Lyubimov <dlieu.7@gmail.com>
To: user@mahout.apache.org
Content-Type: multipart/alternative; boundary=089e0158a9fee90fcb04dfecb649

--089e0158a9fee90fcb04dfecb649
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Jun 24, 2013 at 1:35 PM, Michael Kazekin <kazmikh@hotmail.com>wrote:

> I agree with you, I should have mentioned earlier that it would be good to
> separate "noise from data" and deal with only what is separable. Of course
> there is no truly deterministic implementation of any algorithm, but I
> would expect to see "credible" results on a macro-level (in our case it
> would be nice to see the same order of recommendations given the fixed
> seed). It seems important for experiments (and for testing, as mentioned),
> isn't it?
> Another question is that afaik ALS-WR is deterministic by its inception,


I am not sure i know a deterministic version of any flavor of ALS, ALS-WR
included. You can make it such by fixing seed, but there's no benefit to it
w.r.t prediction credibility. The problem is guaranteed to converge but
there's always going to be an infinite-small delta between actuall loss and
best loss, at some point further improvements do not cover the
computational cost. I usually stop whenever i don't see more than 5%
training cost improvement w.r.t. previous iterations. In fact, model
parameters will often have much more effect on model credibility than
achieving ideal training cost.


> so I'm trying to understand the reasons (and I'm assuming there are some)
> for the specific implementation design.
>
> Thanks for a free lunch! ;)
> Cheers,Mike.
>
> > Date: Mon, 24 Jun 2013 13:13:20 -0700
> > Subject: Re: Consistent repeatable results for distributed ALS-WR
> recommender
> > From: dlieu.7@gmail.com
> > To: user@mahout.apache.org
> >
> > On Mon, Jun 24, 2013 at 1:07 PM, Michael Kazekin <kazmikh@hotmail.com
> >wrote:
> >
> > > Thank you, Ted!
> > > Any feedback on the usefulness of such functionality? Could it increase
> > > the 'playability' of the recommender?
> > >
> >
> > Almost all methods -- even deterministic ones -- will have a "credible
> > interval" of prediction simply because method assumptions do not hold
> 100%
> > in real life, real data. So what you really want to know in such cases is
> > the credible interval rather than whether method is deterministic or not.
> > Non-deterministic methods might very well be more accurate than
> > deterministic ones in this context, and, therefore, more "useful". Also
> > see: "no free lunch theorem".
> >
> >
> > > > From: ted.dunning@gmail.com
> > > > Date: Mon, 24 Jun 2013 20:46:43 +0100
> > > > Subject: Re: Consistent repeatable results for distributed ALS-WR
> > > recommender
> > > > To: user@mahout.apache.org
> > > >
> > > > See org.apache.mahout.common.RandomUtils#useTestSeed
> > > >
> > > > It provides the ability to freeze the initial seed.  Normally this is
> > > only
> > > > used during testing, but you could use it.
> > > >
> > > >
> > > > On Mon, Jun 24, 2013 at 8:44 PM, Michael Kazekin <
> kazmikh@hotmail.com
> > > >wrote:
> > > >
> > > > > Thanks a lot!
> > > > > Do you know by any chance what are the underlying reasons for
> including
> > > > > such mandatory random seed initialization?
> > > > > Do you see any sense in providing another option, such as filling
> them
> > > > > with zeroes in order to ensure the consistency and repeatability?
> (for
> > > > > example we might want to track and compare the generated
> recommendation
> > > > > lists for different parameters, such as the number of features or
> > > number of
> > > > > iterations etc.)
> > > > > M.
> > > > >
> > > > >
> > > > > > Date: Mon, 24 Jun 2013 19:51:44 +0200
> > > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR
> > > > > recommender
> > > > > > From: ssc@apache.org
> > > > > > To: user@mahout.apache.org
> > > > > >
> > > > > > The matrices of the factorization are initalized randomly. If you
> > > fix the
> > > > > > random seed (would require modification of the code) you should
> get
> > > > > exactly
> > > > > > the same results.
> > > > > > Am 24.06.2013 13:49 schrieb "Michael Kazekin" <
> kazmikh@hotmail.com>:
> > > > > >
> > > > > > > Hi!
> > > > > > > Should I assume that under same dataset and same parameters for
> > > > > factorizer
> > > > > > > and recommender I will get the same results for any specific
> user?
> > > > > > > My current understanding that theoretically ALS-WR algorithm
> could
> > > > > > > guarantee this, but I was wondering could be there any numeric
> > > method
> > > > > > > issues and/or implementation-specific concerns.
> > > > > > > Would appreciate any highlight on this issue.
> > > > > > > Mike.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > >
> > >
>
>

--089e0158a9fee90fcb04dfecb649--