> Hi Anil,
> 1. I am not sure if I understand your answer to #1 (or were you asking me a
> question?). Could you pls clarify? The sample patterns I gave is only a
> small subset from the output. I included only those two features for
> simplicity.
Oh. Never mind. Let me see
> 2. I am sending the gzipped sample transaction file (1M downloads) to your
> private email since I am not sure if I can attach files to the mailing list.
> Please check your email for the sample file.
> Praveen
> From: ext Robin Anil [mailto:robin.anil@gmail.com]
> Sent: Tuesday, November 09, 2010 12:40 PM
> To: user@mahout.apache.org
> Subject: Re: Deriving associations from frequent patterns
> On Tue, Nov 9, 2010 at 9:50 PM, <praveen.peddi@nokia.com> wrote:
> > Hello all,
> > I am new to mahout. I have just started looking into mahout to replace
> > our current fpgrowth implementation with a parallel fp growth that
> > Mahout since we started having scalability issues. I looked at
> > PFPGrowth documentation and I noticed that it only produces top K
> > frequent patterns but not the associations and what we need is
> > associations. So I was thinking of implementing a simple
> > AssociationGenerator given the frequent patterns output. However I am
> > not sure what is the best way to generate associations given the frequent
> patterns produced by mahout.
> > I have the following sample output from mahout.
> >
> > Key: 46485: Value: ([46485],936), ([46705, 46485],355)
> > Key: 46705: Value: ([46705],2526)
> > We are interested only in item set size of 2 since we need only 1
> > ANTECEDENT to 1 CONSEQUENT ASSOCIATIONS ONLY.
> >
> > I was planning to calculate associations with confidence as follows:
> > For each key above as A {
> > for each twoitem set as [A,C] {
> > confidence (A>C) = support(A>C)/support(C);
> > add association (A, C, confidence(A>C) to the list;
> > }
> > }
> >
> > Keeping the above requirement and pseudo code n mind, my questions as
> > follows:
> > 1. Is the above algorithm efficient?
> >
> You are running it over a set of Top K patterns. Its small. doesnt matter
> if its inefficient or not
>
> > 2. In the first pattern, [46705, 46485] occurred 355 times but in
> > second pattern why is the same pattern not repeated. Because of this
> > calculating confidence (46705 > 46485) becomes difficult. As you can
> > see from above code, I was planning to read patterns for each feature
> > and calculate confidence of all association with antecedent. But when
> > I read feature 46705, I cannot calculate confidence of (46705 >
> > 46485) since the item set is not included with the feature.
> >
> Good question. I guess the partitioning is screwing this up as there are
> other K1 patterns in the list > 355. Can you give a sample to test.
>
> > 3. Has anyone implemented associations from the generated frequent
> > patterns.
> >
> Nope
>
> > Thanks
> > Praveen
