Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 045BA200C23 for ; Wed, 22 Feb 2017 21:36:49 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 02FD6160B49; Wed, 22 Feb 2017 20:36:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 458F1160B62 for ; Wed, 22 Feb 2017 21:36:48 +0100 (CET) Received: (qmail 27076 invoked by uid 500); 22 Feb 2017 20:36:47 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 27066 invoked by uid 99); 22 Feb 2017 20:36:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Feb 2017 20:36:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E62851A0760 for ; Wed, 22 Feb 2017 20:36:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.198 X-Spam-Level: X-Spam-Status: No, score=-1.198 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bN8MX1XEjE5u for ; Wed, 22 Feb 2017 20:36:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 8DB995FE3F for ; Wed, 22 Feb 2017 20:36:45 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D9885E0AF9 for ; Wed, 22 Feb 2017 20:36:44 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4501324134 for ; Wed, 22 Feb 2017 20:36:44 +0000 (UTC) Date: Wed, 22 Feb 2017 20:36:44 +0000 (UTC) From: "Alan Gates (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-15882) HS2 generating high memory pressure with many partitions and concurrent queries MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 22 Feb 2017 20:36:49 -0000 [ https://issues.apache.org/jira/browse/HIVE-15882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879174#comment-15879174 ] Alan Gates commented on HIVE-15882: ----------------------------------- Have you had a chance to rerun jxray tool with the patch applied? It would be interesting to see how much total memory is saved by these changes and what the next set of Strings, properties, and collections are that need tackled. > HS2 generating high memory pressure with many partitions and concurrent queries > ------------------------------------------------------------------------------- > > Key: HIVE-15882 > URL: https://issues.apache.org/jira/browse/HIVE-15882 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Attachments: HIVE-15882.01.patch, hs2-crash-2000p-500m-50q.txt > > > I've created a Hive table with 2000 partitions, each backed by two files, with one row in each file. When I execute some number of concurrent queries against this table, e.g. as follows > {code} > for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:10000 -n admin -p admin -e "select count(i_f_1) from misha_table;" & done > {code} > it results in a big memory spike. With 20 queries I caused an OOM in a HS2 server with -Xmx200m and with 50 queries - in the one with -Xmx500m. > I am attaching the results of jxray (www.jxray.com) analysis of a heap dump that was generated in the 50queries/500m heap scenario. It suggests that there are several opportunities to reduce memory pressure with not very invasive changes to the code: > 1. 24.5% of memory is wasted by duplicate strings (see section 6). With String.intern() calls added in the ~10 relevant places in the code, this overhead can be highly reduced. > 2. Almost 20% of memory is wasted due to various suboptimally used collections (see section 8). There are many maps and lists that are either empty or have just 1 element. By modifying the code that creates and populates these collections, we may likely save 5-10% of memory. > 3. Almost 20% of memory is used by instances of java.util.Properties. It looks like these objects are highly duplicate, since for each Partition each concurrently running query creates its own copy of Partion, PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 partitions) Properties in memory. By interning/deduplicating these objects we may be able to save perhaps 15% of memory. > So overall, I think there is a good chance to reduce HS2 memory consumption in this scenario by ~40%. -- This message was sent by Atlassian JIRA (v6.3.15#6346)