mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Deriving associations from frequent patterns
Date Tue, 09 Nov 2010 18:11:29 GMT
Can you try with g1 and tell the resutl

On Tue, Nov 9, 2010 at 11:37 PM, <praveen.peddi@nokia.com> wrote:

> Here is the command I used to run PFPGrowth. I am still using only single
> machine. Will be setting up hadoop cluster soon.
>
> $ hadoop jar mahout-examples-0.4-job.jar
> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver      -i downloads-input
>  -o reco-patterns-output      -k 50      -method mapreduce      -g 10
>  -regex '[\ ]' -s 500
>
> -----Original Message-----
> From: ext Robin Anil [mailto:robin.anil@gmail.com]
> Sent: Tuesday, November 09, 2010 1:01 PM
> To: user@mahout.apache.org
> Subject: Re: Deriving associations from frequent patterns
>
> On Tue, Nov 9, 2010 at 11:20 PM, <praveen.peddi@nokia.com> wrote:
>
> > Hi Anil,
> > 1. I am not sure if I understand your answer to #1 (or were you asking
> > me a question?). Could you pls clarify? The sample patterns I gave is
> > only a small subset from the output. I included only those two
> > features for simplicity.
> >
>  Oh. Never mind. Let me see
>
>
> > 2. I am sending the gzipped sample transaction file (1M downloads) to
> > your private email since I am not sure if I can attach files to the
> mailing list.
> > Please check your email for the sample file.
> >
> > Praveen
> >
> > -----Original Message-----
> > From: ext Robin Anil [mailto:robin.anil@gmail.com]
> > Sent: Tuesday, November 09, 2010 12:40 PM
> > To: user@mahout.apache.org
> > Subject: Re: Deriving associations from frequent patterns
> >
> > On Tue, Nov 9, 2010 at 9:50 PM, <praveen.peddi@nokia.com> wrote:
> >
> > > Hello all,
> > > I am new to mahout. I have just started looking into mahout to
> > > replace our current fpgrowth implementation with a parallel fp
> > > growth that Mahout since we started having scalability issues. I
> > > looked at PFPGrowth documentation and I noticed that it only
> > > produces top K frequent patterns but not the associations and what
> > > we need is associations. So I was thinking of implementing a simple
> > > AssociationGenerator given the frequent patterns output. However I
> > > am not sure what is the best way to generate associations given the
> > > frequent
> > patterns produced by mahout.
> > >
> > > I have the following sample output from mahout.
> > >
> > > Key: 46485: Value: ([46485],936), ([46705, 46485],355)
> > > Key: 46705: Value: ([46705],2526)
> > >
> > > We are interested only in item set size of 2 since we need only 1
> > > ANTECEDENT to 1 CONSEQUENT ASSOCIATIONS ONLY.
> > >
> > > I was planning to calculate associations with confidence as follows:
> > > For each key above as A {
> > >        for each two-item set as [A,C] {
> > >                confidence (A->C) = support(A->C)/support(C);
> > >                add association (A, C, confidence(A->C) to the list;
> > >        }
> > > }
> > >
> > > Keeping the above requirement and pseudo code n mind, my questions
> > > as
> > > follows:
> > > 1. Is the above algorithm efficient?
> > >
> > You are running it over a set of Top K patterns. Its small. doesnt
> > matter if its inefficient or not
> >
> > > 2. In the first pattern, [46705, 46485] occurred 355 times but in
> > > second pattern why is the same pattern not repeated. Because of this
> > > calculating confidence (46705 -> 46485) becomes difficult. As you
> > > can see from above code, I was planning to read patterns for each
> > > feature and calculate confidence of all association with antecedent.
> > > But when I read feature 46705, I cannot calculate confidence of
> > > (46705 ->
> > > 46485) since the item set is not included with the feature.
> > >
> > Good question. I guess the partitioning is screwing this up as there
> > are other K-1 patterns in the list > 355. Can you give a sample to test.
> >
> > > 3. Has anyone implemented associations from the generated frequent
> > > patterns.
> > >
> > Nope
> >
> > >
> > >
> > > Thanks
> > > Praveen
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message