mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <praveen.pe...@nokia.com>
Subject Re: Deriving associations from frequent patterns
Date Wed, 10 Nov 2010 12:44:03 GMT
Ok thanks Anil.

Please let me know if you need anything else from me regarding my original question of calculating
association rules and what can be done to make the output have necessary information.

Praveen

On Nov 9, 2010, at 11:17 PM, ext Robin Anil <robin.anil@gmail.com> wrote:

> g is the number of groups in which features get divided so that the total
> size of transactions in bytes is almost equal in each reducer. See the
> PFPGrowth paper. With g=1 you get the original fpgrowth. I usually suggest a
> g size == numfeatures / (10 or 20) so as to make parallel fpgrowth scalable
> and still get similar results as the sequential one.
> 
> Robin
> 
> On Wed, Nov 10, 2010 at 12:23 AM, <praveen.peddi@nokia.com> wrote:
> 
>> Hi Anil,
>> Here is the result for the same features with g=1
>> Key: 46705: Value: ([46705],2526), ([46705, 46840],698)
>> Key: 46485: Value: ([46485],936), ([46705, 46485],355), ([46840,
>> 46485],329), ([46847, 46485],211), ([46705, 46840, 46485],207), ([46485,
>> 46815],175), ([46485, 46852],159), ([46840, 46847, 46485],130), ([46705,
>> 46847, 46485],126), ([46705, 46485, 46815],105), ([46840, 46485, 46815],97),
>> ([46840, 46485, 46852],96), ([46847, 46485, 46815],94), ([46705, 46485,
>> 46852],93), ([46705, 46840, 46847, 46485],92), ([20975, 46485],92), ([16794,
>> 46485],80), ([46847, 46485, 46852],76), ([46705, 46840, 46485, 46815],75),
>> ([46485, 46852, 46815],75), ([46705, 46840, 46485, 46852],69), ([20924,
>> 46485],68), ([46705, 46847, 46485, 46815],67), ([46840, 46847, 46485,
>> 46815],66), ([20975, 46705, 46840, 46485],65), ([46840, 46847, 46485,
>> 46852],56), ([20975, 46705, 46485],55), ([20975, 46840, 46485],54), ([46705,
>> 46840, 46847, 46485, 46815],53)
>> 
>> Full Result for same features when g=500 is:
>> Key: 46705: Value: ([46705],2526)
>> Key: 46485: Value: ([46485],936), ([46705, 46485],355), ([46840,
>> 46485],329), ([46847, 46485],211), ([46705, 46840, 46485],205), ([46840,
>> 46847, 46485],127), ([46705, 46847, 46485],124), ([20975, 46485],92),
>> ([46705, 46840, 46847, 46485],90), ([20975, 46705, 46485],55), ([20975,
>> 46840, 46485],54), ([21243, 46485],47), ([20975, 46705, 46840, 46485],43),
>> ([39140, 46485],37), ([20975, 46847, 46485],31), ([20975, 46840, 46847,
>> 46485],27), ([20975, 46705, 46847, 46485],26), ([20975, 46705, 46840, 46847,
>> 46485],23), ([27984, 46705, 46485],23), ([21243, 46840, 46485],22), ([21243,
>> 46705, 46485],21), ([39140, 46840, 46485],19), ([21243, 46847, 46485],18),
>> ([39140, 46705, 46485],15), ([21243, 46705, 46840, 46485],14), ([6942,
>> 46485],14), ([21243, 46840, 46847, 46485],13), ([39140, 46847, 46485],13),
>> ([39140, 46840, 46847, 46485],11), ([20975, 39140, 46485],11), ([20975,
>> 21243, 46485],11), ([39140, 46705, 46840, 46485],10), ([27984, 46705, 46840,
>> 46847, 46485],9), ([39140, 46705, 46847, 46485],9), ([20975, 27984, 46705,
>> 46485],8), ([39140, 46705, 46840, 46847, 46485],7), ([20975, 27984, 46705,
>> 46840, 46485],7), ([21243, 46705, 46847, 46485],7), ([20975, 39140, 46840,
>> 46485],7), ([6942, 46705, 46485],7), ([21243, 46705, 46840, 46847,
>> 46485],6), ([20975, 21243, 46840, 46847, 46485],6), ([21243, 27984,
>> 46485],6), ([39140, 27984, 46485],6), ([6942, 46840, 46485],6), ([20975,
>> 27984, 46705, 46847, 46485],5), ([39140, 27984, 46847, 46485],5), ([20975,
>> 39140, 46705, 46485],5), ([21243, 39140, 46485],5), ([4873, 46485],5)
>> 
>> The results are obviously different. This raises another question. Are the
>> frequent patterns supposed to change with different values of g?
>> 
>> Praveen
>> 
>> -----Original Message-----
>> From: ext Robin Anil [mailto:robin.anil@gmail.com]
>> Sent: Tuesday, November 09, 2010 1:11 PM
>> To: user@mahout.apache.org
>> Subject: Re: Deriving associations from frequent patterns
>> 
>> Can you try with g1 and tell the resutl
>> 
>> On Tue, Nov 9, 2010 at 11:37 PM, <praveen.peddi@nokia.com> wrote:
>> 
>>> Here is the command I used to run PFPGrowth. I am still using only
>>> single machine. Will be setting up hadoop cluster soon.
>>> 
>>> $ hadoop jar mahout-examples-0.4-job.jar
>>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver      -i downloads-input
>>> -o reco-patterns-output      -k 50      -method mapreduce      -g 10
>>> -regex '[\ ]' -s 500
>>> 
>>> -----Original Message-----
>>> From: ext Robin Anil [mailto:robin.anil@gmail.com]
>>> Sent: Tuesday, November 09, 2010 1:01 PM
>>> To: user@mahout.apache.org
>>> Subject: Re: Deriving associations from frequent patterns
>>> 
>>> On Tue, Nov 9, 2010 at 11:20 PM, <praveen.peddi@nokia.com> wrote:
>>> 
>>>> Hi Anil,
>>>> 1. I am not sure if I understand your answer to #1 (or were you
>>>> asking me a question?). Could you pls clarify? The sample patterns I
>>>> gave is only a small subset from the output. I included only those
>>>> two features for simplicity.
>>>> 
>>> Oh. Never mind. Let me see
>>> 
>>> 
>>>> 2. I am sending the gzipped sample transaction file (1M downloads)
>>>> to your private email since I am not sure if I can attach files to
>>>> the
>>> mailing list.
>>>> Please check your email for the sample file.
>>>> 
>>>> Praveen
>>>> 
>>>> -----Original Message-----
>>>> From: ext Robin Anil [mailto:robin.anil@gmail.com]
>>>> Sent: Tuesday, November 09, 2010 12:40 PM
>>>> To: user@mahout.apache.org
>>>> Subject: Re: Deriving associations from frequent patterns
>>>> 
>>>> On Tue, Nov 9, 2010 at 9:50 PM, <praveen.peddi@nokia.com> wrote:
>>>> 
>>>>> Hello all,
>>>>> I am new to mahout. I have just started looking into mahout to
>>>>> replace our current fpgrowth implementation with a parallel fp
>>>>> growth that Mahout since we started having scalability issues. I
>>>>> looked at PFPGrowth documentation and I noticed that it only
>>>>> produces top K frequent patterns but not the associations and what
>>>>> we need is associations. So I was thinking of implementing a
>>>>> simple AssociationGenerator given the frequent patterns output.
>>>>> However I am not sure what is the best way to generate
>>>>> associations given the frequent
>>>> patterns produced by mahout.
>>>>> 
>>>>> I have the following sample output from mahout.
>>>>> 
>>>>> Key: 46485: Value: ([46485],936), ([46705, 46485],355)
>>>>> Key: 46705: Value: ([46705],2526)
>>>>> 
>>>>> We are interested only in item set size of 2 since we need only 1
>>>>> ANTECEDENT to 1 CONSEQUENT ASSOCIATIONS ONLY.
>>>>> 
>>>>> I was planning to calculate associations with confidence as follows:
>>>>> For each key above as A {
>>>>>       for each two-item set as [A,C] {
>>>>>               confidence (A->C) = support(A->C)/support(C);
>>>>>               add association (A, C, confidence(A->C) to the list;
>>>>>       }
>>>>> }
>>>>> 
>>>>> Keeping the above requirement and pseudo code n mind, my questions
>>>>> as
>>>>> follows:
>>>>> 1. Is the above algorithm efficient?
>>>>> 
>>>> You are running it over a set of Top K patterns. Its small. doesnt
>>>> matter if its inefficient or not
>>>> 
>>>>> 2. In the first pattern, [46705, 46485] occurred 355 times but in
>>>>> second pattern why is the same pattern not repeated. Because of
>>>>> this calculating confidence (46705 -> 46485) becomes difficult. As
>>>>> you can see from above code, I was planning to read patterns for
>>>>> each feature and calculate confidence of all association with
>> antecedent.
>>>>> But when I read feature 46705, I cannot calculate confidence of
>>>>> (46705 ->
>>>>> 46485) since the item set is not included with the feature.
>>>>> 
>>>> Good question. I guess the partitioning is screwing this up as there
>>>> are other K-1 patterns in the list > 355. Can you give a sample to
>> test.
>>>> 
>>>>> 3. Has anyone implemented associations from the generated frequent
>>>>> patterns.
>>>>> 
>>>> Nope
>>>> 
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Praveen
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Mime
View raw message