Return-Path: Delivered-To: apmail-hive-dev-archive@www.apache.org Received: (qmail 41046 invoked from network); 6 Apr 2011 20:37:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Apr 2011 20:37:46 -0000 Received: (qmail 85860 invoked by uid 500); 6 Apr 2011 20:37:45 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 85826 invoked by uid 500); 6 Apr 2011 20:37:45 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 85818 invoked by uid 500); 6 Apr 2011 20:37:45 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 85815 invoked by uid 99); 6 Apr 2011 20:37:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Apr 2011 20:37:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Apr 2011 20:37:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C8B2D9580C for ; Wed, 6 Apr 2011 20:37:05 +0000 (UTC) Date: Wed, 6 Apr 2011 20:37:05 +0000 (UTC) From: "Namit Jain (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <591933936.38647.1302122225818.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <804119427.20200.1301442365721.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HIVE-2082) Reduce memory consumption in preparing MapReduce job MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016515#comment-13016515 ] Namit Jain commented on HIVE-2082: ---------------------------------- minor comments in review board > Reduce memory consumption in preparing MapReduce job > ---------------------------------------------------- > > Key: HIVE-2082 > URL: https://issues.apache.org/jira/browse/HIVE-2082 > Project: Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-2082.patch, HIVE-2082.patch, HIVE-2082.patch > > > Hive client side consume a lot of memory when the number of input partitions is large. One reason is that each partition maintains a list of FieldSchema which are intended to deal with schema evolution. However they are not used currently and Hive uses the table level schema for all partitions. This will be fixed in HIVE-2050. The memory consumption by this part will be reduced by almost half (1.2GB to 700BM for 20k partitions). > Another large chunk of memory consumption is in the MapReduce job setup phase when a PartitionDesc is created from each Partition object. A property object is maintained in PartitionDesc which contains a full list of columns and types. Due to the same reason, these should be the same as in the table level schema. Also the deserializer initialization takes large amount of memory, which should be avoided. My initial testing for these optimizations cut the memory consumption in half (700MB to 300MB for 20k partitions). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira