asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1556) Hash Table used by External hash group-by doesn't conform to the budget.
Date Wed, 17 Aug 2016 23:45:21 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425608#comment-15425608
] 

Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------

[~dtabass] suggested an idea for garbage collection and I totally agree to this idea. It is
feasible without changing the current structure. The missing puzzle filled by [~dtabass] is
written in red. Here are the steps for the garbage collection:

#1. Allocate a new frame.
#2. Read a content frame of Hash Table.
#3. Read a slot information. Check the number of used count for the slot. If it's greater
than zero (meaning that it is being used now), we put it in a newly allocated frame. And update
the corresponding h() value pointer for this location in a header frame. {color:red}*We can
find the h() value of the slot using a first tuple pointer in the slot*.{color} If the number
is zero, reset the corresponding h() value pointer for this location in a header frame, again
using the first tuple pointer in the slot. 
#4. Once a content frame is read fully, then deallocate that content frame. 
#5. Repeat #2 - #4 until a newly allocated frame becomes full. Then reallocate a new frame
and continues.

> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>              Labels: soon
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 3wayjoin.pdf, 3wayjoin.rtf,
3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 2),
the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is translated
into massive number of operators (more than 200 operators in the plan for a 3-way fuzzy join),
it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external hash group
by doesn't conform to the frame limit. So, an out of memory exception happens during the execution
of an external hash group by operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message