mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vipul Pandey <vipan...@gmail.com>
Subject Re: PFPGrowth - weird output?
Date Sat, 05 Feb 2011 17:22:10 GMT
Hey Praveen, 

thanks for responding.

> Frquent patterns are reported per feature which is why you are seeing the two patterns
twice. First one is for feature 1518311 and second one is for feature 1476937.
That's what I thought but then different support values made me dizzy! 

Also, it's seems like it's not just about reporting the pattern for each feature but for each
combination of features : 
> 22 *1476937* 720020 *1518311*
> 30 *1518311* *1476937* 720020
> 30 720020 *1518311* *1476937*
> 34 720020 *1476937* *1518311*
> 38 *1518311* 720020 *1476937*
> 42 *1476937* *1518311* 720020
Here you can see each possible permutation of the three items registering different support.



> Are you running on multi node Hadoop cluster. If so did you read all the output files?
I ran locally and then on a small 4 node cluster. I'm reading the parts file under frequentpatterns
directory.

Let me try to run it on a smaller scale and get you the output soon.

Thanks!
Vipul

On Feb 3, 2011, at 6:44 PM, <praveen.peddi@nokia.com> <praveen.peddi@nokia.com>
wrote:

> Hi Vipul,
> Frquent patterns are reported per feature which is why you are seeing the two patterns
twice. First one is for feature 1518311 and second one is for feature 1476937.
> 
> However both should have the same exact support. I am not sure why you have different
support for the same item set. May be if you send the full output from Mahout as it is we
could take a look.
> 
> Are you running on multi node Hadoop cluster. If so did you read all the output files?
> 
> Praveen
> ________________________________________
> From: ext Vipul Pandey [vipandey@gmail.com]
> Sent: Thursday, February 03, 2011 8:21 PM
> To: user@mahout.apache.org
> Subject: PFPGrowth - weird output?
> 
> Hi all!
> 
> I'm trying to run PFPgrowth on my data and this is an output I get. (Please
> note that I parse the output in frequentpatterns folder and generate this
> output with the support followed by the itemset)
> 
> support : Itemset
> *234     1518311    1476937  *
> 235     55843184
> 238     1238079
> 244     34541
> 247     4516454
> 252     106478
> 252     670864
> *254     1476937   1518311  *
> 
> You can see that two items are reported twice (*1518311    1476937*) with
> different supports.
> 
> And below are all the occurance of these two items together .... if you
> notice it has all the permutations of the three items (*1476937* *720020* *
> 1518311*  )
> 
> 22 *1476937* 720020 *1518311*
> 30 *1518311* *1476937* 720020
> 30 720020 *1518311* *1476937*
> 34 720020 *1476937* *1518311*
> 38 *1518311* 720020 *1476937*
> 42 *1476937* *1518311* 720020
> 234 *1518311* *1476937*
> 254 *1476937* *1518311*
> 
> Does this mean if I have to get the support of just the the pair  (*1476937*
> *1518311*  ) I will have to add all of them up !?
> 
> Even in that case ... this total comes out to *684* and if I count the
> number of co-ocurrances of these two items in the original baskets the
> support is *766*? Why's there a difference? any idea?
> 
> 
> Thanks!
> Vipul


Mime
View raw message