Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 84389 invoked from network); 26 Nov 2010 18:33:26 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Nov 2010 18:33:26 -0000 Received: (qmail 13635 invoked by uid 500); 26 Nov 2010 18:33:25 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 13587 invoked by uid 500); 26 Nov 2010 18:33:25 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 13577 invoked by uid 99); 26 Nov 2010 18:33:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 18:33:25 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.215.170 as permitted sender) Received: from [209.85.215.170] (HELO mail-ey0-f170.google.com) (209.85.215.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 18:33:17 +0000 Received: by eyf5 with SMTP id 5so1029651eyf.1 for ; Fri, 26 Nov 2010 10:32:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=SV68cBdBQqAbJ0EJQQ3iqZH6KNT9oKE6OG7UC2G5e3g=; b=sh4ZsOMoVvyPts1WO3dFyLC5R6OnEQMiz0Dp9l5ICCeJj6Szivib23ov/JWuCz8KI6 O0yyybKkqw1EqfPq10jianJfVFigvbn3HKQ0jfrsPxWMB2Hnsxuog/c3Im7tazghOp5i uSMhGCyjRZVRbmybFYZ9kqKParCu/ALpUfMes= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=bTNaEHJq0Tl5bs19IzSDmiNWpIUn03wuKNYRi5CAIL5MjQHSUviBHMbOpnFcz+ZSDx iA13eqTPSuxZXJp6MxDnEz25V5D8mbvCImt9yO2HRaYgeEktIu0kPU0+45qzq6SYkvx5 Tp4Zo164vxK9wL/gosiQcsxks4aZl/9vKqn5o= MIME-Version: 1.0 Received: by 10.213.108.196 with SMTP id g4mr4299422ebp.31.1290796377402; Fri, 26 Nov 2010 10:32:57 -0800 (PST) Received: by 10.213.113.196 with HTTP; Fri, 26 Nov 2010 10:32:57 -0800 (PST) In-Reply-To: References: Date: Fri, 26 Nov 2010 18:32:57 +0000 Message-ID: Subject: Re: RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation From: Sean Owen To: user@mahout.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org This is because all the ratings are implicitly 1.0 when there are no rating= s. But I actually think this is symptomatic of a problem, since I note that those recommendations are quite suspiciously in order by item ID. I am not sure the current state of the distributed recommender is compatible with boolean data, but I am not an expert here -- Sebastian can we discuss what might be going on here? In the non-distributed code, items are given a "fake" estimated preferences which is not actually an estimated preference (because that would always be 1.0) but some other number that functions as a score -- average similarity to other items for example. This is used as a ranking and also returned as an "estimated preference" even though it's not. Can we do something like that here? or is it already working this way if certain values / options are set? On Fri, Nov 26, 2010 at 6:26 PM, Jordi Abad wrote: > Hi, > > I'm running a RecommenderJob (mahout-0.4 version) over hadoop like this: > > hadoop-0.20 jar /mahout-distribution-0.4/mahout-core-0.4-job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > -Dmapred.input.dir=3Dinput -Dmapred.output.dir=3Doutput -s > SIMILARITY_TANIMOTO_COEFFICIENT -b true > > The job works fine but when I examine the result I get things like: > > 12 =C2=A0 =C2=A0[1:1.0,2:1.0,3:1.0,5:1.0,6:1.0,11:1.0,168:1.0,173:1.0,180= :1.0,199:1.0] > 14 =C2=A0 =C2=A0[1:1.0,2:1.0,3:1.0,5:1.0,6:1.0,11:1.0,14:1.0,21:1.0,22:1.= 0,23:1.0] > ... > > I can't understand why each recommendation gets 1.0 of score. It doesn't > matter which SimilarityClass I set. I always get a score of 1.0. > > My input file is a "boolean file" (1391374 rows) with values like: > > 1,6496241 > 1,4368916 > 1,4922226 > 1,4958662 > ... > > If I run > "org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob" job > over the same file I get good results for items. > > Any ideas? > > Thanks in advance. >