mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaroslaw Odzga (JIRA)" <>
Subject [jira] [Updated] (MAHOUT-625) Some of generated patterns have support higher than in reality
Date Mon, 21 Mar 2011 10:22:05 GMT


Jaroslaw Odzga updated MAHOUT-625:

    Attachment: final_patch_with_bug_fix_test_and_the_dataset.txt


Sorry for a long delay, but I was on holidays :)
I attached the patch containing fix with tests and test dataset.
I'll create separate issue for performance improvement.

> Some of generated patterns have support higher than in reality
> --------------------------------------------------------------
>                 Key: MAHOUT-625
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Frequent Itemset/Association Rule Mining
>    Affects Versions: 0.4
>            Reporter: Jaroslaw Odzga
>            Priority: Critical
>         Attachments: MAHOUT-625-patch.txt, bugfix-patch.txt, dataset_ok.txt, final_patch_with_bug_fix_test_and_the_dataset.txt,
> It turnes out that some of generated patterns have incorrect support. The returned support
is slightly higher than the true one.
> I attached the test, which proves that FPGrowth has a bug. Test is using data (retail)
found here:
> The pattern (36, 39, 41) occurs in the transactions 572 times (this is also calculated
in test), but the FPGrowth returns pattern (36, 39, 41) with support 573.
> Please note that mentioned pattern is not the only one with incorrect support - the test
only point out one example to hace something to focus on. There is plenty more patterns with
support higher than the real one. The biggest difference I noticed was support 8 higher than
the real one for one of patterns.
> Please find attached failing unit test - it's actually a maven project, which contains
test data and is ready to run.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message