mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (Commented) (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-890) Performance issue in FPGrowth
Date Sun, 20 Nov 2011 12:17:51 GMT


Sean Owen commented on MAHOUT-890:

Do you have a suggested fix, or is it more of an observation? I didn't see a change in the
> Performance issue in FPGrowth
> -----------------------------
>                 Key: MAHOUT-890
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.6
>            Reporter: tom pierce
>         Attachments: addSynth.patch, logtrees.patch, smallexample.dat
> I've encountered a dataset which indicates there is probably a performance bug lurking
in the FPGrowth implementation.  This set may be a bit of an unusual target for FPG - there's
a relatively modest number itemsets, and many items with a Zipfy distribution.  I am attaching
a patch (addSynth.patch) to add a similar dataset as core/src/test/resources/FPGsynth.dat.
> FPGsynth.dat can take minutes or a few hours to process, depending on how it is grouped
out to machines.  If run in sequential mode, or with "-g 50" it will take considerable time.
 Most reducers/"anchor items" are processed quickly, but a small number take a handful of
minutes, and one or two take a long time.  If you experiment with this data, I suggest using
 '-s 50 -regex "[ ]+"'. 
> Digging into this, I've found that the tree pruning code sometimes creates surprising
trees.  One oddity I've observed is 0-count nodes, sometimes with non-zero children.  The
other is that sometimes subtrees seem to get repeated.  I'm attaching a sample input file
(smallexample.dat, use the whitespace regex with this one, too) and a patch which adds some
logging in pruneFPTree and growthBottomUp which will print out some interesting trees when
run with the smallexample.dat input.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message