mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <praveen.pe...@nokia.com>
Subject RE: Deriving associations from frequent patterns
Date Tue, 09 Nov 2010 18:07:49 GMT
Here is the command I used to run PFPGrowth. I am still using only single machine. Will be
setting up hadoop cluster soon.

$ hadoop jar mahout-examples-0.4-job.jar org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver  
   -i downloads-input      -o reco-patterns-output      -k 50      -method mapreduce     
-g 10      -regex '[\ ]' -s 500

-----Original Message-----
From: ext Robin Anil [mailto:robin.anil@gmail.com] 
Sent: Tuesday, November 09, 2010 1:01 PM
To: user@mahout.apache.org
Subject: Re: Deriving associations from frequent patterns

On Tue, Nov 9, 2010 at 11:20 PM, <praveen.peddi@nokia.com> wrote:

> Hi Anil,
> 1. I am not sure if I understand your answer to #1 (or were you asking 
> me a question?). Could you pls clarify? The sample patterns I gave is 
> only a small subset from the output. I included only those two 
> features for simplicity.
>
 Oh. Never mind. Let me see


> 2. I am sending the gzipped sample transaction file (1M downloads) to 
> your private email since I am not sure if I can attach files to the mailing list.
> Please check your email for the sample file.
>
> Praveen
>
> -----Original Message-----
> From: ext Robin Anil [mailto:robin.anil@gmail.com]
> Sent: Tuesday, November 09, 2010 12:40 PM
> To: user@mahout.apache.org
> Subject: Re: Deriving associations from frequent patterns
>
> On Tue, Nov 9, 2010 at 9:50 PM, <praveen.peddi@nokia.com> wrote:
>
> > Hello all,
> > I am new to mahout. I have just started looking into mahout to 
> > replace our current fpgrowth implementation with a parallel fp 
> > growth that Mahout since we started having scalability issues. I 
> > looked at PFPGrowth documentation and I noticed that it only 
> > produces top K frequent patterns but not the associations and what 
> > we need is associations. So I was thinking of implementing a simple 
> > AssociationGenerator given the frequent patterns output. However I 
> > am not sure what is the best way to generate associations given the 
> > frequent
> patterns produced by mahout.
> >
> > I have the following sample output from mahout.
> >
> > Key: 46485: Value: ([46485],936), ([46705, 46485],355)
> > Key: 46705: Value: ([46705],2526)
> >
> > We are interested only in item set size of 2 since we need only 1 
> > ANTECEDENT to 1 CONSEQUENT ASSOCIATIONS ONLY.
> >
> > I was planning to calculate associations with confidence as follows:
> > For each key above as A {
> >        for each two-item set as [A,C] {
> >                confidence (A->C) = support(A->C)/support(C);
> >                add association (A, C, confidence(A->C) to the list;
> >        }
> > }
> >
> > Keeping the above requirement and pseudo code n mind, my questions 
> > as
> > follows:
> > 1. Is the above algorithm efficient?
> >
> You are running it over a set of Top K patterns. Its small. doesnt 
> matter if its inefficient or not
>
> > 2. In the first pattern, [46705, 46485] occurred 355 times but in 
> > second pattern why is the same pattern not repeated. Because of this 
> > calculating confidence (46705 -> 46485) becomes difficult. As you 
> > can see from above code, I was planning to read patterns for each 
> > feature and calculate confidence of all association with antecedent. 
> > But when I read feature 46705, I cannot calculate confidence of 
> > (46705 ->
> > 46485) since the item set is not included with the feature.
> >
> Good question. I guess the partitioning is screwing this up as there 
> are other K-1 patterns in the list > 355. Can you give a sample to test.
>
> > 3. Has anyone implemented associations from the generated frequent 
> > patterns.
> >
> Nope
>
> >
> >
> > Thanks
> > Praveen
> >
> >
>

Mime
View raw message