hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-10600) optimize group by for GC
Date Mon, 04 May 2015 21:10:07 GMT
Sergey Shelukhin created HIVE-10600:
---------------------------------------

             Summary: optimize group by for GC
                 Key: HIVE-10600
                 URL: https://issues.apache.org/jira/browse/HIVE-10600
             Project: Hive
          Issue Type: Bug
            Reporter: Sergey Shelukhin


Quote [~gopalv]:
{noformat}
So, something like a sum() GROUP BY will create a few hundred thousand
AbstractAggregationBuffer objects all of which will suddenly go out of
scope when the map.aggr flushes it down to the sort buffer.

That particular GC collection takes forever because the tiny buffers take
a lot of time to walk over and then they leave the memory space
fragmented, which requires a compaction pass (which btw, writes to a
page-interleaved NUMA zone).

And to make things worse, the pre-allocated sort buffers with absolutely
zero data in them take up most of the tenured regions causing these chunks
of memory to be visited more and more often as they are part of the Eden
space.
{noformat}

We need flat data structures to be GC friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message