mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: about the sourcecode of PFPGrowth
Date Thu, 29 Jul 2010 02:11:21 GMT
Dont think too much on the naming conventions. There is a little more to the
story. For example for each feature growth() is called which, along with the
other two are optimized to mine TopK frequent patterns for a particular
feature(here called as currentAttribute).
For the returned patterns from TopDown and BottomUp, the merge step is done
differently (optimized again for speed to remove some redundant checks) if
either the attribute is having a higher support or a lower support than
the current Attribute.

At the moment, I am bit disconnected from the code. Have to read up to
explain more. Feel free to throw more questions, that way I it will help me
will revisit them and write more documentation about it.

Robin


On Wed, Jul 28, 2010 at 7:00 PM, Andrew Wang <andrew.wang.1900@gmail.com>wrote:

> Hi, Robin.
>
> Thanks for your  wonderfulreply ! it makes me understand this algrithms
> further.
>
> But, i still have some questions.
>
> On Thu, Jul 29, 2010 at 6:37 AM, Robin Anil <robin.anil@gmail.com> wrote:
>
> > This is not regular FPGrowth. This has many other super improvements ;)
> >
> > 1) Its a faster method for mining of Top K patterns for each unique item:
> > How does it do it? First it makes the conditional tree for each feature
> in
> >
>
> I think the growth() function works in this way.
>
>
> > Bottom up manner(like the paper). Then it mines the conditional tree in
> top
> >
>
> and, the  growthTopDown() function works as this manner( from top nodes to
> mining the frequent patterns, as you said, it will make the mining process
> fast).
>
> But i still can't understand why this function should  use the
> growthBottomUp(). it seems that this growthBottomUp() function works like
> the growth() function from bottom node to top in the headerTable.
>
> growth() -> growthTopDown() -> growthBottomUp()
>
> Maybe, you just can help to explain how the growthBottomUp() works here.
>
> Thanks!
>
> down manner. This ensures it mines the top frequent sub patterns into a
> Heap
> > or in this case a PriorityQueue which allows me to remove the least
> > frequent
> > item easily. So everytime my queue is full and the support of the item in
> > the tree falls below that of the first entry in the Queue, I stop mining.
> > This makes it really really fast.  Also I try to mine only the closed
> > patterns. See below:
> > If,
> > A,B,C => 4
> > and
> > A, B => 4.
> > I select only the first as second is already part of first.
> >
> >
> > 2) The intermediate data which gets thrown in the map/reduces looks like
> an
> > FPTree called the Pattern tree. So I am able to compress more data making
> > the whole process faster
> >
> > 3). While making conditional trees, I try to zip them up and prune nodes
> > which fall below the min support. This is called FP-Bonsai algorithm.
> This
> > is blazing on tree which have a huge depth.(i.e the input data is having
> > longer transactions)
> >
> >
> > Hope this clears up your questions. Please take a look at the original
> > https://issues.apache.org/jira/browse/MAHOUT-157 issue. You will see how
> I
> > progressed from standard FPGrowth to this. Search mailing lists for older
> > discussions
> >
> > Robin
> >
> > On Tue, Jul 27, 2010 at 11:55 PM, Andrew Wang <
> andrew.wang.1900@gmail.com
> > >wrote:
> >
> > > Ye, thanks for your reply, Ted.
> > >
> > > As you know, after construct the FP-tree, we will use the growth
> fuction
> > to
> > > produce the frequent patterns.
> > >
> > > As the paper (Mining Frequent Patterns without Candidate Generation)
> > > described, we should get the patterns from the bottom of the
> > > HeaderTableAttributes (which always are leaf nodes in the pt-tree).
> > >
> > > So, maybe we just need to growthBottomUp() function. But, i also see
> > > growthTopDown() function in the FPGrowth.java.
> > >
> > > I just confused with this two functions, what are the differeces
> between
> > > them?
> > >
> > > Wish i describe my question clearly now.
> > >
> > > thanks
> > >
> > > On Wed, Jul 28, 2010 at 2:29 PM, Ted Dunning <ted.dunning@gmail.com>
> > > wrote:
> > >
> > > > What is it that you don't understand about it?
> > > >
> > > > If you can give specific questions, you are much more likely to get a
> > > good
> > > > answer.  The paper that you already have should give you most of the
> > > > general
> > > > hints that you need.  From there, you should provide specific
> > questions.
> > > >
> > > > On Tue, Jul 27, 2010 at 10:13 PM, Andrew Wang <
> > > andrew.wang.1900@gmail.com
> > > > >wrote:
> > > >
> > > > >        Now, only the growth(FPTree tree, MutableLong
> > minSupportMutable,
> > > > int
> > > > > k, FPTreeDepthCache treeCache, int level, int currentAttribute,
> > > > > StatusUpdater updater) function in the FPGrowth.jave cannot be
> > > understood
> > > > > by
> > > > > myself.
> > > > >
> > > > >        Would you please tell how it works? or give me some papers
> to
> > > help
> > > > > me to understand it? thanks´╝ü
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > http://anqiang1900.blog.163.com/
> > >
> >
>
>
>
> --
> http://anqiang1900.blog.163.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message