Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 92398105CC for ; Tue, 27 Aug 2013 22:43:52 +0000 (UTC) Received: (qmail 87202 invoked by uid 500); 27 Aug 2013 22:43:52 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 87058 invoked by uid 500); 27 Aug 2013 22:43:52 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 87049 invoked by uid 500); 27 Aug 2013 22:43:52 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 87046 invoked by uid 99); 27 Aug 2013 22:43:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Aug 2013 22:43:52 +0000 Date: Tue, 27 Aug 2013 22:43:52 +0000 (UTC) From: "Hudson (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays & OOMs - use a static emptyRow instead MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751824#comment-13751824 ] Hudson commented on HIVE-5144: ------------------------------ FAILURE: Integrated in Hive-trunk-hadoop2-ptest #73 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/73/]) HIVE-5144 : HashTableSink allocates empty new Object[] arrays & OOMs - use a static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1517877) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java > HashTableSink allocates empty new Object[] arrays & OOMs - use a static emptyRow instead > ---------------------------------------------------------------------------------------- > > Key: HIVE-5144 > URL: https://issues.apache.org/jira/browse/HIVE-5144 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: Ubuntu LXC + -Xmx512m client opts > Reporter: Gopal V > Assignee: Gopal V > Priority: Minor > Labels: perfomance > Fix For: 0.12.0 > > Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch > > > The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. > {code} > Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], > ... > MapJoinRowContainer rowContainer = tableContainer.get(key); > if (rowContainer == null) { > rowContainer = new MapJoinRowContainer(); > rowContainer.add(value); > {code} > But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container & a pre-allocated zero object array which is immutable (the only immutable array there is in java). > The query tested is roughly the following to scan all of customer_demographics in the hash-sink > {code} > select c_salutation, count(1) > from customer > JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk > group by c_salutation > limit 10 > ; > {code} > When running with current trunk, the code results in an OOM with 512Mb ram. > {code} > 2013-08-23 05:11:26 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 292418944 percentage: 0.579 > Execution failed with exit status: 3 > Obtaining error information > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira