asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1556) Hash Table used by External hash group-by doesn't conform to the budget.
Date Sat, 17 Sep 2016 22:04:20 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499768#comment-15499768
] 

Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------

@[~buyingyi]: If we assume that each field consists of at least 4 bytes, that means there
are 32 bits. The max value of INT is 2^31 -1, which is greater than 32MB (2^25), 64MB (2^26),
or 128MB (2^27). What I'm trying to say here is that for a reasonable budget size, we always
choose the number of possible tuples in data table rather than considering both (# of tuples,
# of hash entries). I think the formula might be reduced to y = 32M / (8+4+1) * 40 to calculate
the expected byte size of hash table. Then y / (32 + y) to get the actual ratio. How's your
thought? 


> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>            Priority: Critical
>              Labels: soon
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 3wayjoin.pdf, 3wayjoin.rtf,
3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 2),
the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is translated
into massive number of operators (more than 200 operators in the plan for a 3-way fuzzy join),
it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external hash group
by doesn't conform to the frame limit. So, an out of memory exception happens during the execution
of an external hash group by operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message