hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Created] (HIVE-5144) HashTableSink allocates empty new Object[] arrays & OOMs - use a static emptyRow instead
Date Fri, 23 Aug 2013 17:16:52 GMT
Gopal V created HIVE-5144:

             Summary: HashTableSink allocates empty new Object[] arrays & OOMs - use a
static emptyRow instead
                 Key: HIVE-5144
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
         Environment: Ubuntu LXC + -Xmx4096m client opts
            Reporter: Gopal V
            Assignee: Gopal V
            Priority: Minor

The map-join hashtable sink in the local-task creates an in-memory hashtable with the following

 Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 MapJoinRowContainer rowContainer = tableContainer.get(key);
    if (rowContainer == null) {
      rowContainer = new MapJoinRowContainer();

But for a query where the joinValues[alias].size() == 0, this results in a large number of
unnecessary allocations which would be better served with a copy-on-write default value container
& a pre-allocated zero object array which is immutable (the only immutable array there
is in java).

The query tested is roughly the following to scan all of customer_demographics in the hash-sink


select c_salutation, count(1)
 from customer
      JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk
 group by c_salutation
 limit 10


When running with current trunk, the code results in an OOM with 512Mb ram.

2013-08-23 05:11:26	Processing rows:	1400000	Hashtable size:	1399999	Memory usage:	292418944
percentage:	0.579

Execution failed with exit status: 3
Obtaining error information

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message