I have been musing on this JIRA:
Path  multiple symbol matches per row
https://issues.apache.org/jira/browse/MADLIB943
and become concerned with combinatorial explosion, even for a modest number
of symbol hits per row.
For n symbols per row and m rows in a partition, number of symbol
combinations per partition is n^m.
e.g., for n=2 and m=50 this results in ~1e15 symbol combinations which we
certainly don't want to traverse.
Does anyone have experience or an opinion on this topic?
In the current version of MADlib.path()
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
a given row can only match one symbol. If a row matches multiple symbols,
the symbol that comes first in the symbol definition list will take
precedence.
In some examples I have seen around
https://astercommunity.teradata.com/community/learnaster/blog/2015/07/01/supersweetnpathexampleswithsourcecode
it seems that multiple symbols per row are used.
Question is do we need to address this at all?
Frank
