Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F4A77301 for ; Fri, 25 Nov 2011 08:27:59 +0000 (UTC) Received: (qmail 81669 invoked by uid 500); 25 Nov 2011 08:27:58 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 81513 invoked by uid 500); 25 Nov 2011 08:27:57 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 81505 invoked by uid 99); 25 Nov 2011 08:27:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Nov 2011 08:27:57 +0000 X-ASF-Spam-Status: No, hits=1.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gregh7@gmail.com designates 209.85.215.170 as permitted sender) Received: from [209.85.215.170] (HELO mail-ey0-f170.google.com) (209.85.215.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Nov 2011 08:27:50 +0000 Received: by eaak13 with SMTP id k13so1305614eaa.1 for ; Fri, 25 Nov 2011 00:27:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=aOfz5llzJfKcr2/Q67/e12PyMrqgrdbkSZbxg+Yf6NA=; b=hoPne6tJRqhmHZo5k/LGBQxKdiFhiLRAwa9Kq3Hkhv8uB2nn2OlAHYdJRM8420a7Wm onenBgEemiWVpiYMOaKoJgC5Mf9AhCd6adEIWBoR8x1kobs+w4UItGKs7Nx+fPuz4Zao 3eyDRPNi8PxavFgHTUDbxrvWdmGAX3dHKelzM= MIME-Version: 1.0 Received: by 10.180.80.98 with SMTP id q2mr32588148wix.53.1322209649081; Fri, 25 Nov 2011 00:27:29 -0800 (PST) Received: by 10.227.197.67 with HTTP; Fri, 25 Nov 2011 00:27:28 -0800 (PST) In-Reply-To: <4ECF3DDC.902@apache.org> References: <4ECF3DDC.902@apache.org> Date: Fri, 25 Nov 2011 17:27:28 +0900 Message-ID: Subject: Re: ItemSimilarityJob's results differ from non-distributed version From: Greg H To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=f46d0442834673bc2e04b28aecad --f46d0442834673bc2e04b28aecad Content-Type: text/plain; charset=ISO-8859-1 Hi Sebastian, I converted the dataset by simply keeping all user/item pairs that had a rating of above 3. I'm also using GenericItemBasedRecommender's mostSimilarItems method instead of the recommend method to make recommendations. I'm certainly open to suggestions on better evaluation metrics. I'm just using the top 5 because it was easy to implement. Thanks, Greg On Fri, Nov 25, 2011 at 4:03 PM, Sebastian Schelter wrote: > Hi Greg, > > You should get the same results, can you describe exactly how you > converted the dataset? I'd like to try this myself, maybe you found some > subtle bug. > > I also have doubts whether taking the precision of the top 5 recommended > items is really a good quality measure. > > --sebastian > > On 25.11.2011 02:41, Greg H wrote: > > Thanks for the replies Sebastian and Sean. I looked at the similarity > > values and they are the same, but ItemSimilarityJob is calculating fewer > of > > them. So it must be still doing some sort of sampling. I thought that I > > could force it to use all of the data by setting maxPrefsPerUser > > sufficiently large. Could there be another reason for it not to calculate > > all of the similarity values? > > > > I also tried to use a smaller amount of similarItemsPerItem but this > leads > > to worse results. > > > > Thanks again, > > Greg > > > > --f46d0442834673bc2e04b28aecad--