Can you try with g1 and tell the resutl On Tue, Nov 9, 2010 at 11:37 PM, wrote: > Here is the command I used to run PFPGrowth. I am still using only single > machine. Will be setting up hadoop cluster soon. > > $ hadoop jar mahout-examples-0.4-job.jar > org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i downloads-input > -o reco-patterns-output -k 50 -method mapreduce -g 10 > -regex '[\ ]' -s 500 > > -----Original Message----- > From: ext Robin Anil [mailto:robin.anil@gmail.com] > Sent: Tuesday, November 09, 2010 1:01 PM > To: user@mahout.apache.org > Subject: Re: Deriving associations from frequent patterns > > On Tue, Nov 9, 2010 at 11:20 PM, wrote: > > > Hi Anil, > > 1. I am not sure if I understand your answer to #1 (or were you asking > > me a question?). Could you pls clarify? The sample patterns I gave is > > only a small subset from the output. I included only those two > > features for simplicity. > > > Oh. Never mind. Let me see > > > > 2. I am sending the gzipped sample transaction file (1M downloads) to > > your private email since I am not sure if I can attach files to the > mailing list. > > Please check your email for the sample file. > > > > Praveen > > > > -----Original Message----- > > From: ext Robin Anil [mailto:robin.anil@gmail.com] > > Sent: Tuesday, November 09, 2010 12:40 PM > > To: user@mahout.apache.org > > Subject: Re: Deriving associations from frequent patterns > > > > On Tue, Nov 9, 2010 at 9:50 PM, wrote: > > > > > Hello all, > > > I am new to mahout. I have just started looking into mahout to > > > replace our current fpgrowth implementation with a parallel fp > > > growth that Mahout since we started having scalability issues. I > > > looked at PFPGrowth documentation and I noticed that it only > > > produces top K frequent patterns but not the associations and what > > > we need is associations. So I was thinking of implementing a simple > > > AssociationGenerator given the frequent patterns output. However I > > > am not sure what is the best way to generate associations given the > > > frequent > > patterns produced by mahout. > > > > > > I have the following sample output from mahout. > > > > > > Key: 46485: Value: ([46485],936), ([46705, 46485],355) > > > Key: 46705: Value: ([46705],2526) > > > > > > We are interested only in item set size of 2 since we need only 1 > > > ANTECEDENT to 1 CONSEQUENT ASSOCIATIONS ONLY. > > > > > > I was planning to calculate associations with confidence as follows: > > > For each key above as A { > > > for each two-item set as [A,C] { > > > confidence (A->C) = support(A->C)/support(C); > > > add association (A, C, confidence(A->C) to the list; > > > } > > > } > > > > > > Keeping the above requirement and pseudo code n mind, my questions > > > as > > > follows: > > > 1. Is the above algorithm efficient? > > > > > You are running it over a set of Top K patterns. Its small. doesnt > > matter if its inefficient or not > > > > > 2. In the first pattern, [46705, 46485] occurred 355 times but in > > > second pattern why is the same pattern not repeated. Because of this > > > calculating confidence (46705 -> 46485) becomes difficult. As you > > > can see from above code, I was planning to read patterns for each > > > feature and calculate confidence of all association with antecedent. > > > But when I read feature 46705, I cannot calculate confidence of > > > (46705 -> > > > 46485) since the item set is not included with the feature. > > > > > Good question. I guess the partitioning is screwing this up as there > > are other K-1 patterns in the list > 355. Can you give a sample to test. > > > > > 3. Has anyone implemented associations from the generated frequent > > > patterns. > > > > > Nope > > > > > > > > > > > Thanks > > > Praveen > > > > > > > > >