hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
Date Fri, 28 Feb 2014 00:02:19 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915238#comment-13915238
] 

Gopal V commented on HIVE-6518:
-------------------------------

Yes, also the ORC scenario is more complex for strings in dictionaries. 

A substring does not drop the rest of the data off the memory overhead because in vectorized
mode, only the start:len get modified, no new allocations are made.

So a group by SUBSTR() will keep the entire string in  memory, except the VGBY does not know
that it does.

> Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-6518
>                 URL: https://issues.apache.org/jira/browse/HIVE-6518
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Minor
>         Attachments: HIVE-6518.1-tez.patch
>
>
> The current VectorGroupByOperator implementation flushes the in-memory hashes when the
maximum entries or fraction of memory is hit.
> This works for most cases, but there are some corner cases where we hit GC ovehead limits
or heap size limits before either of those conditions are reached due to the rest of the pipeline.
> This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a
full GC pass happened sometime in the near past & the aggregation hashtables should be
flushed immediately before another full GC is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message