madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McQuillan <>
Subject Question about multiple symbol matches per row in MADlib path function - needed?
Date Fri, 08 Jul 2016 23:09:25 GMT
I have been musing on this JIRA:

Path - multiple symbol matches per row

and become concerned with combinatorial explosion, even for a modest number
of symbol hits per row.

For n symbols per row and m rows in a partition, number of symbol
combinations per partition is n^m.

e.g., for n=2 and m=50 this results in ~1e15 symbol combinations which we
certainly don't want to traverse.

Does anyone have experience or an opinion on this topic?

In the current version of MADlib.path()
a given row can only match one symbol. If a row matches multiple symbols,
the symbol that comes first in the symbol definition list will take

In some examples I have seen around
it seems that multiple symbols per row are used.

Question is do we need to address this at all?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message