madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Proposed improvement to association rules (Apriori) algorithm
Date Thu, 27 Oct 2016 22:00:33 GMT
Here is a comment from a MADlib user that I recently heard:

“No apparent way to set an upper bound for itemset size in assoc_rules
function. This results in it running forever with larger data sets. In the
R "arules" package, you can set a max itemset size so that it doesn't look
for unnecessarily large associations.”
https://cran.r-project.org/web/packages/arules/arules.pdf

Does a single optional parameter make sense to add to
http://madlib.incubator.apache.org/docs/latest/group__grp__assoc__rules.html
similar to the maxlen parameter in “arules” ?

Any other considerations here or improvements to make the this algorithm at
the same time? minlen?

Thanks,
Frank

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message