Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 68763200AC0 for ; Tue, 10 May 2016 01:37:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 670C01609A8; Mon, 9 May 2016 23:37:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 20A2F160A0F for ; Tue, 10 May 2016 01:37:14 +0200 (CEST) Received: (qmail 23812 invoked by uid 500); 9 May 2016 23:37:13 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 23397 invoked by uid 99); 9 May 2016 23:37:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 May 2016 23:37:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E10592C033A for ; Mon, 9 May 2016 23:37:12 +0000 (UTC) Date: Mon, 9 May 2016 23:37:12 +0000 (UTC) From: "JinsuKim (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-13665) HS2 memory leak When multiple queries are running with get_json_object MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 09 May 2016 23:37:16 -0000 [ https://issues.apache.org/jira/browse/HIVE-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JinsuKim updated HIVE-13665: ---------------------------- Affects Version/s: 2.0.0 > HS2 memory leak When multiple queries are running with get_json_object > ---------------------------------------------------------------------- > > Key: HIVE-13665 > URL: https://issues.apache.org/jira/browse/HIVE-13665 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0, 2.0.0 > Reporter: JinsuKim > Attachments: patch.lst.txt > > > The extractObjectCache in UDFJson is increased over limitation(CACHE_SIZE = 16). When multiple queries are running concurrently on HS2 local(not mr/tez) with get_json_object or get_json_tuple > {code:java|title=HS2 heap_dump} > Object at 0x515ab18f8 > instance of org.apache.hadoop.hive.ql.udf.UDFJson$HashCache@0x515ab18f8 (77 bytes) > Class: > class org.apache.hadoop.hive.ql.udf.UDFJson$HashCache > Instance data members: > accessOrder (Z) : false > entrySet (L) : > hashSeed (I) : 0 > header (L) : java.util.LinkedHashMap$Entry@0x515a577d0 (60 bytes) > keySet (L) : > loadFactor (F) : 0.6 > modCount (I) : 4741146 > size (I) : 2733158 <========== here!! > table (L) : [Ljava.util.HashMap$Entry;@0x7163d8b70 (67108880 bytes) > threshold (I) : 5033165 > values (L) : > References to this object: > {code} > I think that this problem be caused by the LinkedHashMap object is not thread-safe > {code} > *

Note that this implementation is not synchronized. > * If multiple threads access a linked hash map concurrently, and at least > * one of the threads modifies the map structurally, it must be > * synchronized externally. This is typically accomplished by > * synchronizing on some object that naturally encapsulates the map. > {code} > Reproduce : > # Multiple queries are running with get_json_object and small input data(for execution on hs2 local mode) > # jvm heap dump & analyze > {code:title=test scenario} > Multiple queries are running with get_json_object and small input data(for execute on hs2 local mode) > 1.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040105' > 2.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040106' > 3.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040107' > 4.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM xxx.tttt WHERE part_hour='2016040108' > > run.sh : > t_cnt=0 > while true > do > echo "query executing..." > for i in 1 2 3 4 > do > beeline -u jdbc:hive2://localhost:10000 -n hive --silent=true -f $i.hql > $i.log 2>&1 & > done > wait > t_cnt=`expr $t_cnt + 1` > echo "query count : $t_cnt" > sleep 2 > done > jvm heap dump & analyze : > jmap -dump:format=b,file=hive.dmp $PID > jhat -J-mx48000m -port 8080 hive.dmp & > {code} > Finally I have attached our patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)