hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17009) Revisiting the removement of managed connection and connection caching
Date Fri, 04 Nov 2016 08:34:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15635667#comment-15635667

Yu Li commented on HBASE-17009:

Back on this (sorry for the lag [~enis])...

Checking HBASE-16713 and got below questions:
bq. Connection.getConnection() is removed in master for good reasons. The connection lifecycle
should always be explicit
Mind further explain these "good reasons"? As mentioned in description, I see some bad aspect
for connection lifecycle exposed to end user. Or maybe I'm misunderstanding the meaning of
being *explicit*?

bq. Looking at the old code that we have in branch-1 for ConnectionManager, managed connections
and the code in ConnectionCache, I think we should do a first-class client level API called
ConnectionCache which will be a hybrid between ConnectionCache and old ConnectionManager.
Mind talking about more details about this "hybrid"?

Basically I think we should supply two kinds of API for getting connection:
1. {{createConnection}} just like the existing one, through which user could manage the connection
lifecycle by themselves, and this mainly is for advanced users.
2. {{getConnection}} like we ever had before, through which connection lifecycle is auto-managed
and users don't need to worry about that. This is compatible with the old ways of HTable and
could make the migration from old version more smoothly. And this is more kind for junior

And we need to explicitly point this out in our refguid/doc so user could choose the way they
prefer. Thoughts?


> Revisiting the removement of managed connection and connection caching
> ----------------------------------------------------------------------
>                 Key: HBASE-17009
>                 URL: https://issues.apache.org/jira/browse/HBASE-17009
>             Project: HBase
>          Issue Type: Task
>          Components: Operability
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
> In HBASE-13197 we have done lots of good cleanups for Connection API, but among which
HBASE-13252 dropped the feature of managed connection and connection caching, and this JIRA
propose to have a revisit on this decision for below reasons.
> Assume we have a long running process with multiple threads accessing HBase (a common
case for streaming application), let's see what happens previously and now.
> Previously:
> User could create an HTable instance whenever they want w/o worrying about the underlying
connections because HBase client will mange it automatically, say no matter how many threads
there will be only one Connection instance
> {code}
>   @Deprecated
>   public HTable(Configuration conf, final TableName tableName)
>   throws IOException {
>     ...
>     this.connection = ConnectionManager.getConnectionInternal(conf);
>     ...
>   }
>   static ClusterConnection getConnectionInternal(final Configuration conf)
>     throws IOException {
>     HConnectionKey connectionKey = new HConnectionKey(conf);
>     synchronized (CONNECTION_INSTANCES) {
>       HConnectionImplementation connection = CONNECTION_INSTANCES.get(connectionKey);
>       if (connection == null) {
>         connection = (HConnectionImplementation)createConnection(conf, true);
>         CONNECTION_INSTANCES.put(connectionKey, connection);
>       } else if (connection.isClosed()) {
>         ConnectionManager.deleteConnection(connectionKey, true);
>         connection = (HConnectionImplementation)createConnection(conf, true);
>         CONNECTION_INSTANCES.put(connectionKey, connection);
>       }
>       connection.incCount();
>       return connection;
>     }
>   }
> {code}
> Now:
> User has to create the connection by themselves, using below codes like indicated in
our recommendations
> {code}
>     Connection connection = ConnectionFactory.createConnection(conf);
>     Table table = connection.getTable(tableName);
> {code}
> And they must make sure *only one* single connection created in one *process* instead
of creating HTable instance freely, or else there might be many connections setup to zookeeper/RS
with multiple threads. Also user might ask "when I should close the connection I close?" and
the answer is "make sure don't close it until the *process* shutdown"
> So now we have much more things for user to "Make sure", but custom is something hard
to change. User used to create table instance in each thread (according to which table to
access per requested) so probably they will still create connections everywhere, and then
operators will have to crazily resolve all kinds of problems...
> So I'm proposing to add back the managed connection and connection caching support. IMHO
it's something good and ever existed in our implementation, so let's bring it back and save
the workload for operators when they decided to upgrade from 1.x to 2.x
> Thoughts?

This message was sent by Atlassian JIRA

View raw message