Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57808D493 for ; Mon, 24 Jun 2013 21:01:48 +0000 (UTC) Received: (qmail 50343 invoked by uid 500); 24 Jun 2013 21:01:46 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 50307 invoked by uid 500); 24 Jun 2013 21:01:46 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 50299 invoked by uid 99); 24 Jun 2013 21:01:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jun 2013 21:01:46 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates 209.85.214.178 as permitted sender) Received: from [209.85.214.178] (HELO mail-ob0-f178.google.com) (209.85.214.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jun 2013 21:01:41 +0000 Received: by mail-ob0-f178.google.com with SMTP id fb19so11090123obc.23 for ; Mon, 24 Jun 2013 14:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=GHUQ/lkZt2KluObpW5Q1IBpx5Qf3kWO6otB32lTn79Q=; b=fooeByWJJ7an0KdzhQ5AbOFcZ18WCFJyyhWq5TktjZxhxpCRH/ulU8ybQyKcvB54T0 5gLT6eUiaaNhxcqg1YOyVyMP/uAo5EbKLdqUpvORBTa2VqL0BQhOsgUJuso28RpVqUsh +ynGbP/y0wMjuW6lLd1CQKzaO8be0QOlea2ifd/C0Wz92r4Ra6g/efZwsWa4zXRVe4o7 c+GsEBS8AfyWTjDjZGBEbdyCOZPtbDDKTsYevJoCewIgOGtqqPJ6Z4bMM1NiGMTJ1cw4 WZeWzEdHqQPgiWomn3lxI0uMB5GKHMq84GeyE/P4iatEhYic3zeIlcwPyCjnCv+2O8hd 8QgA== MIME-Version: 1.0 X-Received: by 10.182.47.137 with SMTP id d9mr8658366obn.26.1372107680843; Mon, 24 Jun 2013 14:01:20 -0700 (PDT) Received: by 10.76.109.163 with HTTP; Mon, 24 Jun 2013 14:01:20 -0700 (PDT) In-Reply-To: References: Date: Mon, 24 Jun 2013 14:01:20 -0700 Message-ID: Subject: Re: Consistent repeatable results for distributed ALS-WR recommender From: Dmitriy Lyubimov To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=089e0158a9fee90fcb04dfecb649 X-Virus-Checked: Checked by ClamAV on apache.org --089e0158a9fee90fcb04dfecb649 Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jun 24, 2013 at 1:35 PM, Michael Kazekin wrote: > I agree with you, I should have mentioned earlier that it would be good to > separate "noise from data" and deal with only what is separable. Of course > there is no truly deterministic implementation of any algorithm, but I > would expect to see "credible" results on a macro-level (in our case it > would be nice to see the same order of recommendations given the fixed > seed). It seems important for experiments (and for testing, as mentioned), > isn't it? > Another question is that afaik ALS-WR is deterministic by its inception, I am not sure i know a deterministic version of any flavor of ALS, ALS-WR included. You can make it such by fixing seed, but there's no benefit to it w.r.t prediction credibility. The problem is guaranteed to converge but there's always going to be an infinite-small delta between actuall loss and best loss, at some point further improvements do not cover the computational cost. I usually stop whenever i don't see more than 5% training cost improvement w.r.t. previous iterations. In fact, model parameters will often have much more effect on model credibility than achieving ideal training cost. > so I'm trying to understand the reasons (and I'm assuming there are some) > for the specific implementation design. > > Thanks for a free lunch! ;) > Cheers,Mike. > > > Date: Mon, 24 Jun 2013 13:13:20 -0700 > > Subject: Re: Consistent repeatable results for distributed ALS-WR > recommender > > From: dlieu.7@gmail.com > > To: user@mahout.apache.org > > > > On Mon, Jun 24, 2013 at 1:07 PM, Michael Kazekin >wrote: > > > > > Thank you, Ted! > > > Any feedback on the usefulness of such functionality? Could it increase > > > the 'playability' of the recommender? > > > > > > > Almost all methods -- even deterministic ones -- will have a "credible > > interval" of prediction simply because method assumptions do not hold > 100% > > in real life, real data. So what you really want to know in such cases is > > the credible interval rather than whether method is deterministic or not. > > Non-deterministic methods might very well be more accurate than > > deterministic ones in this context, and, therefore, more "useful". Also > > see: "no free lunch theorem". > > > > > > > > From: ted.dunning@gmail.com > > > > Date: Mon, 24 Jun 2013 20:46:43 +0100 > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR > > > recommender > > > > To: user@mahout.apache.org > > > > > > > > See org.apache.mahout.common.RandomUtils#useTestSeed > > > > > > > > It provides the ability to freeze the initial seed. Normally this is > > > only > > > > used during testing, but you could use it. > > > > > > > > > > > > On Mon, Jun 24, 2013 at 8:44 PM, Michael Kazekin < > kazmikh@hotmail.com > > > >wrote: > > > > > > > > > Thanks a lot! > > > > > Do you know by any chance what are the underlying reasons for > including > > > > > such mandatory random seed initialization? > > > > > Do you see any sense in providing another option, such as filling > them > > > > > with zeroes in order to ensure the consistency and repeatability? > (for > > > > > example we might want to track and compare the generated > recommendation > > > > > lists for different parameters, such as the number of features or > > > number of > > > > > iterations etc.) > > > > > M. > > > > > > > > > > > > > > > > Date: Mon, 24 Jun 2013 19:51:44 +0200 > > > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR > > > > > recommender > > > > > > From: ssc@apache.org > > > > > > To: user@mahout.apache.org > > > > > > > > > > > > The matrices of the factorization are initalized randomly. If you > > > fix the > > > > > > random seed (would require modification of the code) you should > get > > > > > exactly > > > > > > the same results. > > > > > > Am 24.06.2013 13:49 schrieb "Michael Kazekin" < > kazmikh@hotmail.com>: > > > > > > > > > > > > > Hi! > > > > > > > Should I assume that under same dataset and same parameters for > > > > > factorizer > > > > > > > and recommender I will get the same results for any specific > user? > > > > > > > My current understanding that theoretically ALS-WR algorithm > could > > > > > > > guarantee this, but I was wondering could be there any numeric > > > method > > > > > > > issues and/or implementation-specific concerns. > > > > > > > Would appreciate any highlight on this issue. > > > > > > > Mike. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --089e0158a9fee90fcb04dfecb649--