Return-Path: Delivered-To: apmail-mahout-user-archive@www.apache.org Received: (qmail 54347 invoked from network); 9 Nov 2010 18:53:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Nov 2010 18:53:18 -0000 Received: (qmail 5186 invoked by uid 500); 9 Nov 2010 18:53:49 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 5123 invoked by uid 500); 9 Nov 2010 18:53:48 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 5115 invoked by uid 99); 9 Nov 2010 18:53:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 18:53:48 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of praveen.peddi@nokia.com designates 192.100.105.134 as permitted sender) Received: from [192.100.105.134] (HELO mgw-mx09.nokia.com) (192.100.105.134) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 18:53:42 +0000 Received: from vaebh106.NOE.Nokia.com (vaebh106.europe.nokia.com [10.160.244.32]) by mgw-mx09.nokia.com (Switch-3.3.3/Switch-3.3.3) with ESMTP id oA9IrISb009961 for ; Tue, 9 Nov 2010 12:53:19 -0600 Received: from vaebh102.NOE.Nokia.com ([10.160.244.23]) by vaebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 9 Nov 2010 20:53:12 +0200 Received: from smtp.mgd.nokia.com ([65.54.30.6]) by vaebh102.NOE.Nokia.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Tue, 9 Nov 2010 20:53:05 +0200 Received: from NOK-EUMSG-02.mgdnok.nokia.com ([65.54.30.87]) by nok-am1mhub-02.mgdnok.nokia.com ([65.54.30.6]) with mapi; Tue, 9 Nov 2010 19:53:04 +0100 From: To: Date: Tue, 9 Nov 2010 19:53:02 +0100 Subject: RE: Deriving associations from frequent patterns Thread-Topic: Deriving associations from frequent patterns Thread-Index: AcuAOax5k/fa/vO4SwmJonxMAfKBEwABS1EQ Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 09 Nov 2010 18:53:05.0323 (UTC) FILETIME=[57B907B0:01CB803F] X-Nokia-AV: Clean Hi Anil, Here is the result for the same features with g=3D1 Key: 46705: Value: ([46705],2526), ([46705, 46840],698) Key: 46485: Value: ([46485],936), ([46705, 46485],355), ([46840, 46485],329= ), ([46847, 46485],211), ([46705, 46840, 46485],207), ([46485, 46815],175),= ([46485, 46852],159), ([46840, 46847, 46485],130), ([46705, 46847, 46485],= 126), ([46705, 46485, 46815],105), ([46840, 46485, 46815],97), ([46840, 464= 85, 46852],96), ([46847, 46485, 46815],94), ([46705, 46485, 46852],93), ([4= 6705, 46840, 46847, 46485],92), ([20975, 46485],92), ([16794, 46485],80), (= [46847, 46485, 46852],76), ([46705, 46840, 46485, 46815],75), ([46485, 4685= 2, 46815],75), ([46705, 46840, 46485, 46852],69), ([20924, 46485],68), ([46= 705, 46847, 46485, 46815],67), ([46840, 46847, 46485, 46815],66), ([20975, = 46705, 46840, 46485],65), ([46840, 46847, 46485, 46852],56), ([20975, 46705= , 46485],55), ([20975, 46840, 46485],54), ([46705, 46840, 46847, 46485, 468= 15],53) Full Result for same features when g=3D500 is: Key: 46705: Value: ([46705],2526) Key: 46485: Value: ([46485],936), ([46705, 46485],355), ([46840, 46485],329= ), ([46847, 46485],211), ([46705, 46840, 46485],205), ([46840, 46847, 46485= ],127), ([46705, 46847, 46485],124), ([20975, 46485],92), ([46705, 46840, 4= 6847, 46485],90), ([20975, 46705, 46485],55), ([20975, 46840, 46485],54), (= [21243, 46485],47), ([20975, 46705, 46840, 46485],43), ([39140, 46485],37),= ([20975, 46847, 46485],31), ([20975, 46840, 46847, 46485],27), ([20975, 46= 705, 46847, 46485],26), ([20975, 46705, 46840, 46847, 46485],23), ([27984, = 46705, 46485],23), ([21243, 46840, 46485],22), ([21243, 46705, 46485],21), = ([39140, 46840, 46485],19), ([21243, 46847, 46485],18), ([39140, 46705, 464= 85],15), ([21243, 46705, 46840, 46485],14), ([6942, 46485],14), ([21243, 46= 840, 46847, 46485],13), ([39140, 46847, 46485],13), ([39140, 46840, 46847, = 46485],11), ([20975, 39140, 46485],11), ([20975, 21243, 46485],11), ([39140= , 46705, 46840, 46485],10), ([27984, 46705, 46840, 46847, 46485],9), ([3914= 0, 46705, 46847, 46485],9), ([20975, 27984, 46705, 46485],8), ([39140, 4670= 5, 46840, 46847, 46485],7), ([20975, 27984, 46705, 46840, 46485],7), ([2124= 3, 46705, 46847, 46485],7), ([20975, 39140, 46840, 46485],7), ([6942, 46705= , 46485],7), ([21243, 46705, 46840, 46847, 46485],6), ([20975, 21243, 46840= , 46847, 46485],6), ([21243, 27984, 46485],6), ([39140, 27984, 46485],6), (= [6942, 46840, 46485],6), ([20975, 27984, 46705, 46847, 46485],5), ([39140, = 27984, 46847, 46485],5), ([20975, 39140, 46705, 46485],5), ([21243, 39140, = 46485],5), ([4873, 46485],5)=20 The results are obviously different. This raises another question. Are the = frequent patterns supposed to change with different values of g? Praveen -----Original Message----- From: ext Robin Anil [mailto:robin.anil@gmail.com]=20 Sent: Tuesday, November 09, 2010 1:11 PM To: user@mahout.apache.org Subject: Re: Deriving associations from frequent patterns Can you try with g1 and tell the resutl On Tue, Nov 9, 2010 at 11:37 PM, wrote: > Here is the command I used to run PFPGrowth. I am still using only=20 > single machine. Will be setting up hadoop cluster soon. > > $ hadoop jar mahout-examples-0.4-job.jar > org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i downloads-input > -o reco-patterns-output -k 50 -method mapreduce -g 10 > -regex '[\ ]' -s 500 > > -----Original Message----- > From: ext Robin Anil [mailto:robin.anil@gmail.com] > Sent: Tuesday, November 09, 2010 1:01 PM > To: user@mahout.apache.org > Subject: Re: Deriving associations from frequent patterns > > On Tue, Nov 9, 2010 at 11:20 PM, wrote: > > > Hi Anil, > > 1. I am not sure if I understand your answer to #1 (or were you=20 > > asking me a question?). Could you pls clarify? The sample patterns I=20 > > gave is only a small subset from the output. I included only those=20 > > two features for simplicity. > > > Oh. Never mind. Let me see > > > > 2. I am sending the gzipped sample transaction file (1M downloads)=20 > > to your private email since I am not sure if I can attach files to=20 > > the > mailing list. > > Please check your email for the sample file. > > > > Praveen > > > > -----Original Message----- > > From: ext Robin Anil [mailto:robin.anil@gmail.com] > > Sent: Tuesday, November 09, 2010 12:40 PM > > To: user@mahout.apache.org > > Subject: Re: Deriving associations from frequent patterns > > > > On Tue, Nov 9, 2010 at 9:50 PM, wrote: > > > > > Hello all, > > > I am new to mahout. I have just started looking into mahout to=20 > > > replace our current fpgrowth implementation with a parallel fp=20 > > > growth that Mahout since we started having scalability issues. I=20 > > > looked at PFPGrowth documentation and I noticed that it only=20 > > > produces top K frequent patterns but not the associations and what=20 > > > we need is associations. So I was thinking of implementing a=20 > > > simple AssociationGenerator given the frequent patterns output.=20 > > > However I am not sure what is the best way to generate=20 > > > associations given the frequent > > patterns produced by mahout. > > > > > > I have the following sample output from mahout. > > > > > > Key: 46485: Value: ([46485],936), ([46705, 46485],355) > > > Key: 46705: Value: ([46705],2526) > > > > > > We are interested only in item set size of 2 since we need only 1=20 > > > ANTECEDENT to 1 CONSEQUENT ASSOCIATIONS ONLY. > > > > > > I was planning to calculate associations with confidence as follows: > > > For each key above as A { > > > for each two-item set as [A,C] { > > > confidence (A->C) =3D support(A->C)/support(C); > > > add association (A, C, confidence(A->C) to the list; > > > } > > > } > > > > > > Keeping the above requirement and pseudo code n mind, my questions=20 > > > as > > > follows: > > > 1. Is the above algorithm efficient? > > > > > You are running it over a set of Top K patterns. Its small. doesnt=20 > > matter if its inefficient or not > > > > > 2. In the first pattern, [46705, 46485] occurred 355 times but in=20 > > > second pattern why is the same pattern not repeated. Because of=20 > > > this calculating confidence (46705 -> 46485) becomes difficult. As=20 > > > you can see from above code, I was planning to read patterns for=20 > > > each feature and calculate confidence of all association with anteced= ent. > > > But when I read feature 46705, I cannot calculate confidence of > > > (46705 -> > > > 46485) since the item set is not included with the feature. > > > > > Good question. I guess the partitioning is screwing this up as there=20 > > are other K-1 patterns in the list > 355. Can you give a sample to test= . > > > > > 3. Has anyone implemented associations from the generated frequent=20 > > > patterns. > > > > > Nope > > > > > > > > > > > Thanks > > > Praveen > > > > > > > > >