Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 84460 invoked from network); 5 Feb 2011 17:23:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Feb 2011 17:23:07 -0000 Received: (qmail 11403 invoked by uid 500); 5 Feb 2011 17:23:06 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 11115 invoked by uid 500); 5 Feb 2011 17:23:04 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 11107 invoked by uid 99); 5 Feb 2011 17:23:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Feb 2011 17:23:03 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vipandey@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-iw0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Feb 2011 17:22:55 +0000 Received: by iwn6 with SMTP id 6so3055600iwn.1 for ; Sat, 05 Feb 2011 09:22:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to :x-mailer; bh=00doSvXIngqfdHurKH3Le5mvtB/tO11pmDOT9b/Efeg=; b=r9Nh5dxO/XE27QJzcmMIzRisc4r+3mUMywNzMiKEB4O5Eh1zpsQWfA3j1Q5DvqCkXx proxe4SVE2LOokis4RF/a8NzHbpe7Rw+sRFiMQv1d9S87UQhV+/PMfUKlKpEX75fx9B/ bcZNH6hvf+LoUaT00a/kFBm6e3X2/YkqhiN8A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=g6DoO0MDt7nTiB7qNUs8+67N3ZPj8U8obwhlvfUh14LL1T2YsQ9M70UMwaoQpajzFM AyB6udduQmwVpM9BRpMv8lmVAeOP0hOG822E0iSjdtMxA1aVfasCqv5e93ybRhSeCjtr YoAUBRhbst/dP6DipiTunD7OHHeSPCggwXvA8= Received: by 10.231.206.80 with SMTP id ft16mr14908310ibb.110.1296926554432; Sat, 05 Feb 2011 09:22:34 -0800 (PST) Received: from [192.168.15.107] (c-76-126-214-48.hsd1.ca.comcast.net [76.126.214.48]) by mx.google.com with ESMTPS id z4sm1835106ibg.1.2011.02.05.09.22.31 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 05 Feb 2011 09:22:32 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1082) Subject: Re: PFPGrowth - weird output? From: Vipul Pandey In-Reply-To: <6C870DB6C8A84F41898F05D559707F234660F6@008-AM1MPN1-012.mgdnok.nokia.com> Date: Sat, 5 Feb 2011 09:22:10 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <6C870DB6C8A84F41898F05D559707F234660F6@008-AM1MPN1-012.mgdnok.nokia.com> To: user@mahout.apache.org X-Mailer: Apple Mail (2.1082) X-Virus-Checked: Checked by ClamAV on apache.org Hey Praveen,=20 thanks for responding. > Frquent patterns are reported per feature which is why you are seeing = the two patterns twice. First one is for feature 1518311 and second one = is for feature 1476937. That's what I thought but then different support values made me dizzy!=20= Also, it's seems like it's not just about reporting the pattern for each = feature but for each combination of features :=20 > 22 *1476937* 720020 *1518311* > 30 *1518311* *1476937* 720020 > 30 720020 *1518311* *1476937* > 34 720020 *1476937* *1518311* > 38 *1518311* 720020 *1476937* > 42 *1476937* *1518311* 720020 Here you can see each possible permutation of the three items = registering different support.=20 > Are you running on multi node Hadoop cluster. If so did you read all = the output files? I ran locally and then on a small 4 node cluster. I'm reading the parts = file under frequentpatterns directory. Let me try to run it on a smaller scale and get you the output soon. Thanks! Vipul On Feb 3, 2011, at 6:44 PM, = wrote: > Hi Vipul, > Frquent patterns are reported per feature which is why you are seeing = the two patterns twice. First one is for feature 1518311 and second one = is for feature 1476937. >=20 > However both should have the same exact support. I am not sure why you = have different support for the same item set. May be if you send the = full output from Mahout as it is we could take a look. >=20 > Are you running on multi node Hadoop cluster. If so did you read all = the output files? >=20 > Praveen > ________________________________________ > From: ext Vipul Pandey [vipandey@gmail.com] > Sent: Thursday, February 03, 2011 8:21 PM > To: user@mahout.apache.org > Subject: PFPGrowth - weird output? >=20 > Hi all! >=20 > I'm trying to run PFPgrowth on my data and this is an output I get. = (Please > note that I parse the output in frequentpatterns folder and generate = this > output with the support followed by the itemset) >=20 > support : Itemset > *234 1518311 1476937 * > 235 55843184 > 238 1238079 > 244 34541 > 247 4516454 > 252 106478 > 252 670864 > *254 1476937 1518311 * >=20 > You can see that two items are reported twice (*1518311 1476937*) = with > different supports. >=20 > And below are all the occurance of these two items together .... if = you > notice it has all the permutations of the three items (*1476937* = *720020* * > 1518311* ) >=20 > 22 *1476937* 720020 *1518311* > 30 *1518311* *1476937* 720020 > 30 720020 *1518311* *1476937* > 34 720020 *1476937* *1518311* > 38 *1518311* 720020 *1476937* > 42 *1476937* *1518311* 720020 > 234 *1518311* *1476937* > 254 *1476937* *1518311* >=20 > Does this mean if I have to get the support of just the the pair = (*1476937* > *1518311* ) I will have to add all of them up !? >=20 > Even in that case ... this total comes out to *684* and if I count the > number of co-ocurrances of these two items in the original baskets the > support is *766*? Why's there a difference? any idea? >=20 >=20 > Thanks! > Vipul