asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael J. Carey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1556) Prefix-based multi-way Fuzzy-join generates an exception.
Date Sat, 06 Aug 2016 05:57:20 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410502#comment-15410502
] 

Michael J. Carey commented on ASTERIXDB-1556:
---------------------------------------------

I don't think step (4) makes sense or is needed.  If the sum of the space (D+H) exceeds the
budget, invoke the algorithm's current spilling logic - end of change.  We needn't change
the spilling policy itself, not logically - we just have to change the definition of "too
full" to consider the space being used to be D+H instead of D alone.  The rest of the logic
should remain unchanged.  Anything more than that seems like unnecessary complexity.  (Not
sure what it would accomplish.)  Steps (1)-(3) make perfect sense and sound good/right to
me.

If you want to clean this up even more, budget-wise, perhaps you could slightly change the
logic to first ask the Hash Table how many frames it would need to add one entry.  Its answer
could be 0 (which would almost always be the case), 1, or 2.  You could then pass that info
in to the Data Table buffer manager (i.e., tell it how big the insert will cause the total
amount of HT space to be) so that it knows what the total impact of the operation would be
on space used - and then it could make the more global decision itself.

Could you draw a picture of how memory is used when all this is happening and put it in the
docs somewhere?  One think I am uncertain about is how memory looks with multiple partitions,
and I would like to be sure we've got things under proper control in that respect.  (I am
wondering how things are set up to make spilling fairly efficient/painless.)


> Prefix-based multi-way Fuzzy-join generates an exception.
> ---------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 3wayjoin.pdf, 3wayjoin.rtf,
3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 2),
the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is translated
into massive number of operators (more than 200 operators in the plan for a 3-way fuzzy join),
it could generate out-of-memory exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message