From dev-return-48372-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Mon Jan 8 05:25:17 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 5CFA1180654 for ; Mon, 8 Jan 2018 05:25:17 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4D37C160C3D; Mon, 8 Jan 2018 04:25:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6BD9F160C2A for ; Mon, 8 Jan 2018 05:25:16 +0100 (CET) Received: (qmail 71931 invoked by uid 500); 8 Jan 2018 04:25:15 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 71920 invoked by uid 99); 8 Jan 2018 04:25:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jan 2018 04:25:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B7D0A1805E0 for ; Mon, 8 Jan 2018 04:25:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -107.911 X-Spam-Level: X-Spam-Status: No, score=-107.911 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id EPFHAqXBnizh for ; Mon, 8 Jan 2018 04:25:13 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 7ADE65FB98 for ; Mon, 8 Jan 2018 04:25:12 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 030D1E09A5 for ; Mon, 8 Jan 2018 04:25:07 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 84828240DE for ; Mon, 8 Jan 2018 04:25:03 +0000 (UTC) Date: Mon, 8 Jan 2018 04:25:03 +0000 (UTC) From: "Karan Mehta (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (PHOENIX-4503) Phoenix-Spark plugin doesn't release zookeeper connections MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315649#comment-16315649 ] Karan Mehta edited comment on PHOENIX-4503 at 1/8/18 4:24 AM: -------------------------------------------------------------- The reason might be PHOENIX-4489. Can you try this experiment once, pause your code in middle, run a GC and see if the connections are decreased? Each call to {{read()}} method is essentially creating a {[HConnection}}, which contains a {{ZKConnection}}. This should be garbage collected since it gets out of scope real quick as {{PhoenixInputFormat#generateSplits()}} method is completed. Also could you provide the memory usage when the code is running, JVM options and HBase version? [~snalapure@dataken.net] was (Author: karanmehta93): The reason might be PHOENIX-4489. Can you try this experiment once, pause your code in middle, run a GC and see if the connections are decreased? Each call to {{read()}} method is essentially creating a {[HConnection}}, which contains a {{ZKConnection}}. This should be garbage collected since it gets out of scope real quick as {{PhoenixInputFormat#generateSplits()}} method is completed. [~snalapure@dataken.net] > Phoenix-Spark plugin doesn't release zookeeper connections > ---------------------------------------------------------- > > Key: PHOENIX-4503 > URL: https://issues.apache.org/jira/browse/PHOENIX-4503 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.11.0 > Environment: HBase 1.2 on Linux (Ubuntu, CentOS) > Reporter: Suhas Nalapure > > *1. Phoenix-Spark plugin doesn't release zookeeper connections* > Example: > > {code:java} > for(int i=0; i < 50; i++){ > Dataset df = sqlContext.read().format("org.apache.phoenix.spark") > .option("table", "\"Sales\"").option("zkUrl", "localhost:2181") > .load(); > df.show(2); > } > Thread.sleep(1000*60); > {code} > > When the above snippet is executed, we can see number of connections to 2181 increasing and not getting released until after the main thread wakes up from sleep and program ends as can be seen below (14 is the number of connections even before the program starts to run) : > netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 16:52:05 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 22 > 16:52:15 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 38 > 16:52:18 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 68 > 16:52:23 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 100 > 16:52:27 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:52:32 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:52:38 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:52:52 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:53:00 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:53:24 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 16:53:32 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 16:53:34 > root@user1 ~ $ > *2. Instead if "jdbc" format is used to create Spark Dataframe, the connection count doesn't shoot up* > Example: > > {code:java} > for(int i=0; i < 50; i++){ > Dataset df = sqlContext.read().format("jdbc") > .option("url", "jdbc:phoenix:localhost:2181") > .option("dbtable", "\"Sales\"") > .option("driver", "org.apache.phoenix.jdbc.PhoenixDriver") > .load(); > df.show(2); > } > Thread.sleep(1000*60); > {code} > > Connection counts during program execution(14 being the count before execution starts): > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 17:00:42 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 17:00:43 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:00:46 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:00:50 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:00:55 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:12 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:18 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:28 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:34 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:37 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:39 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 17:02:07 -- This message was sent by Atlassian JIRA (v6.4.14#64029)