hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16713) Bring back connection caching as a client API
Date Fri, 30 Jun 2017 02:50:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069388#comment-16069388
] 

Allan Yang commented on HBASE-16713:
------------------------------------

Yes, please bring connection caching back. Currently, we have to use deprecated ConnectionManager.getConnection()
 in branch-1. 
 We have similar cases like spark that many short living thread will access hbase, if connections
are not shared, there may be too many zk connection concurrence. 


> Bring back connection caching as a client API
> ---------------------------------------------
>
>                 Key: HBASE-16713
>                 URL: https://issues.apache.org/jira/browse/HBASE-16713
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, spark
>            Reporter: Enis Soztutar
>             Fix For: 2.0.0
>
>
> Connection.getConnection() is removed in master for good reasons. The connection lifecycle
should always be explicit. We have replaced some of the functionality with ConnectionCache
for rest and thrift servers internally, but it is not exposed to clients. 
> Turns out our friends doing the hbase-spark connector work needs a similar connection
caching behavior that we have in rest and thrift server. At a higher level we want: 
>  - Spark executors should be able to run short living hbase tasks with low latency 
>  - Short living tasks should be able to share the same connection, and should not pay
the price of instantiating the cluster connection (which means zk connection, meta cache,
200+ threads, etc)
>  - Connections to the cluster should be closed if it is not used for some time. Spark
executors are used for other tasks as well. 
>  - Spark jobs may be launched with different configuration objects, possibly connecting
to different clusters between different jobs. 
>  - Although not a direct requirement for spark, different users should not share the
same connection object. 
> Looking at the old code that we have in branch-1 for {{ConnectionManager}}, managed connections
and the code in ConnectionCache, I think we should do a first-class client level API called
ConnectionCache which will be a hybrid between ConnectionCache and old ConnectionManager.
The lifecycle of the ConnectionCache is still explicit, so I think API-design-wise, this will
fit into the current model. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message