hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-17009) Revisiting the removing of managed connection and connection caching
Date Thu, 03 Nov 2016 07:29:59 GMT
Yu Li created HBASE-17009:

             Summary: Revisiting the removing of managed connection and connection caching
                 Key: HBASE-17009
                 URL: https://issues.apache.org/jira/browse/HBASE-17009
             Project: HBase
          Issue Type: Task
            Reporter: Yu Li
            Assignee: Yu Li

In HBASE-13197 we have done lots of good cleanups for Connection API, but among which HBASE-13252
dropped the feature of managed connection and connection caching, and this JIRA propose to
have a revisit on this decision for below reasons.

Assume we have a long running process with multiple threads accessing HBase (a common case
for streaming application), let's see what happens previously and now.

User could create an HTable instance whenever they want w/o worrying about the underlying
connections because HBase client will mange it automatically, say no matter how many threads
there will be only one Connection instance
  public HTable(Configuration conf, final TableName tableName)
  throws IOException {
    this.connection = ConnectionManager.getConnectionInternal(conf);

  static ClusterConnection getConnectionInternal(final Configuration conf)
    throws IOException {
    HConnectionKey connectionKey = new HConnectionKey(conf);
    synchronized (CONNECTION_INSTANCES) {
      HConnectionImplementation connection = CONNECTION_INSTANCES.get(connectionKey);
      if (connection == null) {
        connection = (HConnectionImplementation)createConnection(conf, true);
        CONNECTION_INSTANCES.put(connectionKey, connection);
      } else if (connection.isClosed()) {
        ConnectionManager.deleteConnection(connectionKey, true);
        connection = (HConnectionImplementation)createConnection(conf, true);
        CONNECTION_INSTANCES.put(connectionKey, connection);
      return connection;

User has to create the connection by themselves, using below codes like indicated in our recommendations
    Connection connection = ConnectionFactory.createConnection(conf);
    Table table = connection.getTable(tableName);
And they must make sure *only one* single connection created in one *process* instead of creating
HTable instance freely, or else there might be many connections setup to zookeeper/RS with
multiple threads. Also user might ask "when I should close the connection I close?" and the
answer is "make sure don't close it until the *process* shutdown"

So now we have much more things for user to "Make sure", but custom is something hard to change.
User used to create table instance in each thread (according to which table to access per
requested) so probably they will still create connections everywhere, and then operators will
have to crazily resolve all kinds of problems...

So I'm proposing to add back the managed connection and connection caching support. IMHO it's
something good and ever existed in our implementation, so let's bring it back and save the
workload for operators when they decided to upgrade from 1.x to 2.x


This message was sent by Atlassian JIRA

View raw message