Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7ADD9200C25 for ; Fri, 24 Feb 2017 22:27:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 79737160B69; Fri, 24 Feb 2017 21:27:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C5948160B62 for ; Fri, 24 Feb 2017 22:27:58 +0100 (CET) Received: (qmail 46609 invoked by uid 500); 24 Feb 2017 21:27:57 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 46589 invoked by uid 99); 24 Feb 2017 21:27:57 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Feb 2017 21:27:57 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 1B5C1312EF0; Fri, 24 Feb 2017 21:27:57 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============0728405119290555398==" MIME-Version: 1.0 Subject: Re: Review Request 56687: Intern strings in various critical places to reduce memory consumption. From: Misha Dmitriev To: Sergio Pena , Mohit Sabharwal , Chaoyu Tang Cc: Vihang Karajgaonkar , hive , Sahil Takiar , Rui Li , Misha Dmitriev Date: Fri, 24 Feb 2017 21:27:57 -0000 Message-ID: <20170224212757.13824.71801@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Misha Dmitriev X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/56687/ X-Sender: Misha Dmitriev References: <20170223210122.1739.33159@reviews.apache.org> In-Reply-To: <20170223210122.1739.33159@reviews.apache.org> X-ReviewBoard-Diff-For: common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java Reply-To: Misha Dmitriev X-ReviewRequest-Repository: hive-git archived-at: Fri, 24 Feb 2017 21:27:59 -0000 --===============0728405119290555398== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56687/ ----------------------------------------------------------- (Updated Feb. 24, 2017, 9:27 p.m.) Review request for hive, Chaoyu Tang, Mohit Sabharwal, and Sergio Pena. Changes ------- Addressed Rui's and Mohit's comments. Bugs: https://issues.apache.org/jira/browse/HIVE-15882 https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-15882 Repository: hive-git Description ------- See the description of the problem in https://issues.apache.org/jira/browse/HIVE-15882 Interning strings per this review removes most of the overhead due to duplicate strings. Also, where maps in several places are created from other maps, use the original map's size for the new map. This is to avoid the situation when a map with default capacity (typically 16) is created to hold just 2-3 entries, and the rest of the internal 16-entry array is wasted. Diffs (updated) ----- common/src/java/org/apache/hadoop/hive/common/StringInternUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java e81cbce3e333d44a4088c10491f399e92a505293 ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java 08420664d59f28f75872c25c9f8ee42577b23451 ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java e91064b9c75e8adb2b36f21ff19ec0c1539b03b9 ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 51530ac16c92cc75d501bfcb573557754ba0c964 ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java 55b3b551a1dac92583b6e03b10beb8172ca93d45 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java 82dc89803be9cf9e0018720eeceb90ff450bfdc8 ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java c0edde9e92314d86482b5c46178987e79fae57fe ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java c6ae6f290857cfd10f1023058ede99bf4a10f057 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 24d16812515bdfa90b4be7a295c0388fcdfe95ef ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java ede4fcbe342052ad86dadebcc49da2c0f515ea98 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/NullScanTaskDispatcher.java 0882ae2c6205b1636cbc92e76ef66bb70faadc76 ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java 68b0ad9ea63f051f16fec3652d8525f7ab07eb3f ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java d4bdd96eaf8d179bed43b8a8c3be0d338940154a ql/src/java/org/apache/hadoop/hive/ql/plan/MsckDesc.java b7a7e4b7a5f8941b080c7805d224d3885885f444 ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java 73981e826870139a42ad881103fdb0a2ef8433a2 Diff: https://reviews.apache.org/r/56687/diff/ Testing ------- I've measured how much memory this change plus another one (interning Properties in PartitionDesc) save in my HS2 benchmark - the result is 37%. See the details in HIVE-15882. Thanks, Misha Dmitriev --===============0728405119290555398==--