madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <fmcquil...@pivotal.io>
Subject Question about multiple symbol matches per row in MADlib path function - needed?
Date Fri, 08 Jul 2016 23:09:25 GMT
I have been musing on this JIRA:

Path - multiple symbol matches per row
https://issues.apache.org/jira/browse/MADLIB-943

and become concerned with combinatorial explosion, even for a modest number
of symbol hits per row.

For n symbols per row and m rows in a partition, number of symbol
combinations per partition is n^m.

e.g., for n=2 and m=50 this results in ~1e15 symbol combinations which we
certainly don't want to traverse.

Does anyone have experience or an opinion on this topic?

In the current version of MADlib.path()
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
a given row can only match one symbol. If a row matches multiple symbols,
the symbol that comes first in the symbol definition list will take
precedence.

In some examples I have seen around
https://aster-community.teradata.com/community/learn-aster/blog/2015/07/01/super-sweet-npath-examples-with-source-code
it seems that multiple symbols per row are used.

Question is do we need to address this at all?

Frank

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message