Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@minotaur.apache.org Received: (qmail 10677 invoked from network); 5 Jan 2010 02:53:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jan 2010 02:53:28 -0000 Received: (qmail 18695 invoked by uid 500); 5 Jan 2010 02:53:27 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 18594 invoked by uid 500); 5 Jan 2010 02:53:27 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 18584 invoked by uid 99); 5 Jan 2010 02:53:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jan 2010 02:53:27 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jan 2010 02:53:15 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5D16B234C045 for ; Mon, 4 Jan 2010 18:52:54 -0800 (PST) Message-ID: <1128548103.34931262659974366.JavaMail.jira@brutus.apache.org> Date: Tue, 5 Jan 2010 02:52:54 +0000 (UTC) From: "Robin Anil (JIRA)" To: mahout-dev@lucene.apache.org Subject: [jira] Commented: (MAHOUT-221) Implementation of FP-Bonsai Pruning for fast pattern mining In-Reply-To: <318178103.1260717318285.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAHOUT-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796504#action_12796504 ] Robin Anil commented on MAHOUT-221: ----------------------------------- I am going to commit this. This is a major change and I need this before doing minor tweaks > Implementation of FP-Bonsai Pruning for fast pattern mining > ----------------------------------------------------------- > > Key: MAHOUT-221 > URL: https://issues.apache.org/jira/browse/MAHOUT-221 > Project: Mahout > Issue Type: New Feature > Components: Frequent Itemset/Association Rule Mining > Affects Versions: 0.2 > Reporter: Robin Anil > Assignee: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-FPGROWTH.patch, MAHOUT-FPGROWTH.patch > > > FP Bonsai is a method to prune long chained FP-Trees for faster growth. > http://win.ua.ac.be/~adrem/bibrem/pubs/fpbonsai.pdf > This implementation also adds a transaction preprocessing map/reduce job which converts a list of transactions {1, 2, 4, 5}, {1, 2, 3}, {1, 2} into a tree structure and thus saves space during fpgrowth map/reduce > the tree formed from above is. For typical this improves the storage space by a great amount and thus saves on time during shuffle and sort > (1,3) -> (2,3) | - (4,1) - (5,1) > (3,1) > Also added a reducer to PFPgrowth (not part of the original paper) which does this compression and saves on space. > This patch also adds an example transaction dataset generator from flickr and delicious data set https://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/DataSets/PINTSExperimentsDataSets/ > Both of them are GIG of tag data. Where "date userid itemid tag" is given. The example maker creates a transaction based on all the unique tags a user has tagged on an item. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.