mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-221) Implementation of FP-Bonsai Pruning for fast pattern mining
Date Tue, 05 Jan 2010 02:52:54 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796504#action_12796504
] 

Robin Anil commented on MAHOUT-221:
-----------------------------------

I am going to commit this. This is a major change and I need this before doing minor tweaks

> Implementation of FP-Bonsai Pruning for fast pattern mining
> -----------------------------------------------------------
>
>                 Key: MAHOUT-221
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-221
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.2
>            Reporter: Robin Anil
>            Assignee: Robin Anil
>             Fix For: 0.3
>
>         Attachments: MAHOUT-FPGROWTH.patch, MAHOUT-FPGROWTH.patch
>
>
> FP Bonsai is a method to prune long chained FP-Trees for faster growth. 
> http://win.ua.ac.be/~adrem/bibrem/pubs/fpbonsai.pdf
> This implementation also adds a transaction preprocessing map/reduce job which converts
a list of transactions {1, 2, 4, 5}, {1, 2, 3}, {1, 2} into a tree structure and thus saves
space during fpgrowth map/reduce 
> the tree formed from above is. For typical this improves the storage space by a great
amount and thus saves on time during shuffle and sort
> (1,3) -> (2,3) | - (4,1) - (5,1)
>                       (3,1)        
> Also added a reducer to PFPgrowth (not part of the original paper) which does this compression
and saves on space. 
> This patch also adds an example transaction dataset generator from flickr and delicious
data set https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/DataSets/PINTSExperimentsDataSets/
> Both of them are GIG of tag data. Where "date userid itemid tag" is given. The example
maker creates a transaction based on all the unique tags a user has tagged on an item. 
>          

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message