From issues-return-126173-archive-asf-public=cust-asf.ponee.io@hive.apache.org Wed Jun 27 22:53:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4A06E180625 for ; Wed, 27 Jun 2018 22:53:04 +0200 (CEST) Received: (qmail 35822 invoked by uid 500); 27 Jun 2018 20:53:03 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 35813 invoked by uid 99); 27 Jun 2018 20:53:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jun 2018 20:53:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C4E18CEA2F for ; Wed, 27 Jun 2018 20:53:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id G7TJX0w3Q1KF for ; Wed, 27 Jun 2018 20:53:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 67B2F5F11F for ; Wed, 27 Jun 2018 20:53:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id ACD19E00B8 for ; Wed, 27 Jun 2018 20:53:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 439AB23F8D for ; Wed, 27 Jun 2018 20:53:00 +0000 (UTC) Date: Wed, 27 Jun 2018 20:53:00 +0000 (UTC) From: "Misha Dmitriev (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-19668) Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HIVE-19668: ---------------------------------- Attachment: HIVE-19668.02.patch > Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings > ---------------------------------------------------------------------------------------------- > > Key: HIVE-19668 > URL: https://issues.apache.org/jira/browse/HIVE-19668 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 > Affects Versions: 3.0.0 > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Priority: Major > Attachments: HIVE-19668.01.patch, HIVE-19668.02.patch, image-2018-05-22-17-41-39-572.png > > > I've recently analyzed a HS2 heap dump, obtained when there was a huge memory spike during compilation of some big query. The analysis was done with jxray ([www.jxray.com).|http://www.jxray.com)./] It turns out that more than 90% of the 20G heap was used by data structures associated with query parsing ({{org.apache.hadoop.hive.ql.parse.QBExpr}}). There are probably multiple opportunities for optimizations here. One of them is to stop the code from creating duplicate instances of {{org.antlr.runtime.CommonToken}} class. See a sample of these objects in the attached image: > !image-2018-05-22-17-41-39-572.png|width=879,height=399! > Looks like these particular {{CommonToken}} objects are constants, that don't change once created. I see some code, e.g. in {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}}, where such objects are apparently repeatedly created with e.g. {{new CommonToken(HiveParser.TOK_INSERT, "TOK_INSERT")}} If these 33 token kinds are instead created once and reused, we will save more than 1/10th of the heap in this scenario. Plus, since these objects are small but very numerous, getting rid of them will remove a gread deal of pressure from the GC. > Another source of waste are duplicate strings, that collectively waste 26.1% of memory. Some of them come from CommonToken objects that have the same text (i.e. for multiple CommonToken objects the contents of their 'text' Strings are the same, but each has its own copy of that String). Other duplicate strings come from other sources, that are easy enough to fix by adding String.intern() calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)