Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 314E2462C for ; Sat, 4 Jun 2011 21:29:08 +0000 (UTC) Received: (qmail 90975 invoked by uid 500); 4 Jun 2011 21:29:07 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 90949 invoked by uid 500); 4 Jun 2011 21:29:07 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 90941 invoked by uid 99); 4 Jun 2011 21:29:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jun 2011 21:29:07 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ssc.open@googlemail.com designates 209.85.214.42 as permitted sender) Received: from [209.85.214.42] (HELO mail-bw0-f42.google.com) (209.85.214.42) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Jun 2011 21:28:59 +0000 Received: by bwz18 with SMTP id 18so6182674bwz.1 for ; Sat, 04 Jun 2011 14:28:39 -0700 (PDT) Received: by 10.204.143.141 with SMTP id v13mr3261527bku.203.1307222918017; Sat, 04 Jun 2011 14:28:38 -0700 (PDT) Received: from [192.168.1.6] (g225107235.adsl.alicedsl.de [92.225.107.235]) by mx.google.com with ESMTPS id k10sm2199017bkq.10.2011.06.04.14.28.35 (version=SSLv3 cipher=OTHER); Sat, 04 Jun 2011 14:28:36 -0700 (PDT) Message-ID: <4DEAA384.4010204@apache.org> Date: Sat, 04 Jun 2011 23:28:36 +0200 From: Sebastian Schelter Reply-To: ssc@apache.org User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: user@mahout.apache.org Subject: Re: ItemSimilarityJob Cooccurrence Question References: <1307222513108-3024516.post@n3.nabble.com> In-Reply-To: <1307222513108-3024516.post@n3.nabble.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Derek, this shouldn't be happening and we have unit tests explicitly checking that. Which version do you use? Please be sure to use Mahout 0.5 or the current trunk. Could you provide sample data where you see this happening? --sebastian On 04.06.2011 23:21, djn wrote: > Regarding ItemSimilarityJob, it is my understanding that if there are two > input lines of the form<user1, product1> and<user1, product2>, > then that would constitute a co-occurrence between product1 and product2. > > I've generated a large test dataset under this assumption, and it guarantees > that there will only be co-occurrences between pairs of product IDs that > I've predefined. I'm not using preference values and I'm setting > --booleanData true. > > While the ItemSimilarityJob's output does include these predefined > co-occurrences, it also outputs a large number of co-occurrences (with small > co-occurrence counts) between products that are not co-occurring in the > input dataset. Can anyone provide some insight as to why this might be > happening? > > -- > View this message in context: http://lucene.472066.n3.nabble.com/ItemSimilarityJob-Cooccurrence-Question-tp3024516p3024516.html > Sent from the Mahout User List mailing list archive at Nabble.com.