phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suhas Nalapure (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-4503) Phoenix-Spark plugin doesn't release zookeeper connections
Date Wed, 27 Dec 2017 12:46:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suhas Nalapure updated PHOENIX-4503:
------------------------------------
    Description: 
*1. Phoenix-Spark plugin doesn't release zookeeper connections*
Example: 
		
{code:java}
for(int i=0; i < 50; i++){
			Dataset<Row> df = sqlContext.read().format("org.apache.phoenix.spark")
					.option("table", "\"Sales\"").option("zkUrl", "localhost:2181")
					.load();
			df.show(2);
		}
		Thread.sleep(1000*60); 
{code}
   
 When the above snippet is executed, we can see number of connections to 2181 increasing and
not getting released until after the main thread wakes up from sleep and program ends as can
be seen below (14 is the number of connections even before the program starts to run) :
netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
16:52:05
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
22
16:52:15
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
38
16:52:18
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
68
16:52:23
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
100
16:52:27
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:52:32
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:52:38
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:52:52
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:53:00
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:53:24
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
16:53:32
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
16:53:34
root@user1 ~ $

*2. Instead if "jdbc" format is used to create Spark Dataframe, the connection count doesn't
shoot up *
Example:
		
{code:java}
for(int i=0; i < 50; i++){			
			Dataset<Row> df = sqlContext.read().format("jdbc")
					.option("url", "jdbc:phoenix:localhost:2181")
					.option("dbtable", "\"Sales\"")
					.option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
					.load();
			df.show(2);
		}
		Thread.sleep(1000*60);	
{code}
		
Connection counts during program execution(14 being the count before execution starts):

root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
17:00:42
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
17:00:43
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:00:46
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:00:50
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:00:55
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:12
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:18
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:28
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:34
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:37
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:39
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
17:02:07

  was:
*1. Phoenix-Spark plugin doesn't release zookeeper connections*
Example: 
		for(int i=0; i < 50; i++){
			Dataset<Row> df = sqlContext.read().format("org.apache.phoenix.spark")
					.option("table", "\"Sales\"").option("zkUrl", "localhost:2181")
					.load();
			df.show(2);
		}
		Thread.sleep(1000*60);    
 When the above snippet is executed, we can see number of connections to 2181 increasing and
not getting released until after the main thread wakes up from sleep and program ends as can
be seen below (14 is the number of connections even before the program starts to run) :
netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
16:52:05
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
22
16:52:15
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
38
16:52:18
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
68
16:52:23
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
100
16:52:27
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:52:32
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:52:38
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:52:52
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:53:00
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
116
16:53:24
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
16:53:32
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
16:53:34
root@user1 ~ $

*2. Instead if "jdbc" format is used to create Spark Dataframe, the connection count doesn't
shoot up *
Example:
		for(int i=0; i < 50; i++){			
			Dataset<Row> df = sqlContext.read().format("jdbc")
					.option("url", "jdbc:phoenix:localhost:2181")
					.option("dbtable", "\"Sales\"")
					.option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
					.load();
			df.show(2);
		}
		Thread.sleep(1000*60);			
Connection counts during program execution(14 being the count before execution starts):

root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
17:00:42
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
17:00:43
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:00:46
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:00:50
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:00:55
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:12
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:18
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:28
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:34
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:37
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
16
17:01:39
root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
14
17:02:07


> Phoenix-Spark plugin doesn't release zookeeper connections
> ----------------------------------------------------------
>
>                 Key: PHOENIX-4503
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4503
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>         Environment: HBase 1.2 on Linux (Ubuntu, CentOS)
>            Reporter: Suhas Nalapure
>
> *1. Phoenix-Spark plugin doesn't release zookeeper connections*
> Example: 
> 		
> {code:java}
> for(int i=0; i < 50; i++){
> 			Dataset<Row> df = sqlContext.read().format("org.apache.phoenix.spark")
> 					.option("table", "\"Sales\"").option("zkUrl", "localhost:2181")
> 					.load();
> 			df.show(2);
> 		}
> 		Thread.sleep(1000*60); 
> {code}
>    
>  When the above snippet is executed, we can see number of connections to 2181 increasing
and not getting released until after the main thread wakes up from sleep and program ends
as can be seen below (14 is the number of connections even before the program starts to run)
:
> netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:52:05
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 22
> 16:52:15
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 38
> 16:52:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 68
> 16:52:23
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 100
> 16:52:27
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:38
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:52:52
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:00
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 116
> 16:53:24
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:32
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 16:53:34
> root@user1 ~ $
> *2. Instead if "jdbc" format is used to create Spark Dataframe, the connection count
doesn't shoot up *
> Example:
> 		
> {code:java}
> for(int i=0; i < 50; i++){			
> 			Dataset<Row> df = sqlContext.read().format("jdbc")
> 					.option("url", "jdbc:phoenix:localhost:2181")
> 					.option("dbtable", "\"Sales\"")
> 					.option("driver", "org.apache.phoenix.jdbc.PhoenixDriver")
> 					.load();
> 			df.show(2);
> 		}
> 		Thread.sleep(1000*60);	
> {code}
> 		
> Connection counts during program execution(14 being the count before execution starts):
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:42
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:00:43
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:46
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:50
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:00:55
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:12
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:18
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:28
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:34
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:37
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 16
> 17:01:39
> root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S"
> 14
> 17:02:07



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message