asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1556) Hash Table used by External hash group-by doesn't conform to the budget.
Date Mon, 08 Aug 2016 18:18:20 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412200#comment-15412200
] 

Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------

One more thing regarding the hash table size (the number of unique h() values): 

[~dtabass] suggested that we can use BigInteger.nextProbablePrime(). Since each hash pointer
in the header of Hash table consists of 8 bytes (2 int - frame index, offset), the number
of h() values in a frame is frameSize / 8. So, the range N is frameSize / 8 * #maximum frame.
I would like to suggest that we find a prime number between 0.8N < x < 0.9N since if
x is closer to N, then eventually hash table itself can occupy whole frames and there will
not be enough spaces for saving actual tuples. A weak point of here is that we can't assume
that 0.8 and 0.9 are good range. It just makes sure that there is no 100% occupancy from the
hash table side.  

> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 3wayjoin.pdf, 3wayjoin.rtf,
3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 2),
the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is translated
into massive number of operators (more than 200 operators in the plan for a 3-way fuzzy join),
it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external hash group
by doesn't conform to the frame limit. So, an out of memory exception happens during the execution
of an external hash group by operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message