spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomas Kliegr (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-12999) Guidance on adding a stopping criterion (maximul literal length or itemset count) for FPGrowth
Date Tue, 26 Jan 2016 12:38:40 GMT
Tomas Kliegr created SPARK-12999:
------------------------------------

             Summary: Guidance on adding a stopping criterion (maximul literal length or itemset
count) for FPGrowth
                 Key: SPARK-12999
                 URL: https://issues.apache.org/jira/browse/SPARK-12999
             Project: Spark
          Issue Type: Question
    Affects Versions: 1.6.0
            Reporter: Tomas Kliegr


The absence of stopping criteria results in combinatorial explosion and hence excessive run
time even on small UCI datasets. Since our workflow makes it difficult
to terminate the FPGrowth job when it is running for too long and
iteratively increase the support threshold, we would like to extend
the SPARK FPGrowth implementation with either of the following
stopping criteria:
- maximum number of generated itemsets,
- maximum length of generated itemsets (i.e. number of items).

We would like to ask for any suggestion that could help us modify the
implementation. 

Having a workaround for this problem can not only make difference to our use case, but through
the ability to process more datasets without painful support tweaking also hopefully for the
Spar community. 

This question is related to the following issue: https://issues.apache.org/jira/browse/SPARK-12163




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message