impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-2.5.0_5.7.0) IMPALA-2982: lazy initialization of HBase connection
Date Mon, 29 Feb 2016 18:37:39 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-2982: lazy initialization of HBase connection
......................................................................


Patch Set 1:

(5 comments)

> (2 comments)
 > 
 > The single long-lived application-wide connection is actually the
 > recommended mode of use for HBase connections:
 > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html
 > 
 > It has a bunch of logic to handle HBase servers coming and going,
 > so I think initialising the connection at impalad startup is
 > essentially the same as initialising it at the first HBase query.

Sounds good to me. But we should run it on a HBase cluster just to be sure and if we have
the time, try out things like cycle the Hbase service while impala is running etc. I can help
in this regard if you don't have the time.

http://gerrit.cloudera.org:8080/#/c/2345/1/be/src/runtime/hbase-table-factory.cc
File be/src/runtime/hbase-table-factory.cc:

Line 40: connection_cl_ =
       :     env->FindClass("org/apache/hadoop/hbase/client/Connection");
Single line.


Line 52: lock_guard<mutex> lock(connection_lock_);
> How can the connection_ member get modified by the caller? We copy the poin
You're right. I was just thinking along the lines of a concurrent GetConnection() and destruction.


Line 91: RETURN_IF_ERROR(
       :       JniUtil::LocalToGlobalRef(env, local_connection, &connection_));
Can fit in a single line.


Line 104: if (connection_ != NULL) {
> I'd rather use locks here because it's easier to reason about, and because 
It wouldn't matter if we're holding a lock and concurrent initialization happens. It would
still continue with the initialization after this releases the lock would have the same effect.

But as you said since this isn't a frequently visited code path, we can just leave it as a
lock.

Which leads me to think, should we find some way of signaling GetConnection() the the connection_
has been closed? Or NULL out connection_ if we need it to be reinitialized again.

It also seems odd that if there are 2 HBaseTableFactory objects and one gets destroyed, both
of them would lose the connection, since conenction_ is a static member.


http://gerrit.cloudera.org:8080/#/c/2345/1/be/src/runtime/hbase-table-factory.h
File be/src/runtime/hbase-table-factory.h:

Line 41:   /// Opens the connections to HBase and Zookeeper on the first call.
Add "Returns existing connection on subsequent calls".


-- 
To view, visit http://gerrit.cloudera.org:8080/2345
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I405f5cbfc62dd4a4aeb6f4019cac358376478884
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.5.0_5.7.0
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message