From "Liu, Ming (HPIT-GADSC)" <ming.l...@hp.com>
Subject RE: HTable or HConnectionManager, how a client connect to HBase?
Date Tue, 17 Feb 2015 03:56:08 GMT

I have to spend a lot of time to look into the source code of HTable, HConnectionManager.

IMHO, it seems the document on hbase website is misleading. In the hbase online document :
http://hbase.apache.org/book.html#architecture.client . It mentioned:
For example, this is preferred:

HBaseConfiguration conf = HBaseConfiguration.create();
HTable table1 = new HTable(conf, "myTable");
HTable table2 = new HTable(conf, "myTable");

as opposed to this:

HBaseConfiguration conf1 = HBaseConfiguration.create();
HTable table1 = new HTable(conf1, "myTable");
HBaseConfiguration conf2 = HBaseConfiguration.create();
HTable table2 = new HTable(conf2, "myTable");
After I checking the src code , it seems only in 0.20 code, HTable must use the same Configuration
instance in order to share the HConnection. 0.20 uses the configuration instance as the key
for a hashmap to save HConnections. I check 0.90.0 code, it already use HConnectionKey as
the key of the HashMap which save the shared HConnections. 

So as far as I understand, the document is NOT true for HBase later than 0.90 version. These
two examples can both share HConnection instance. If I am wrong, please correct me.  

For my previous question. If two HTable already share the HConnection, why I need to create
a HConnection first by HConnectionManager.createConnection()?
By reading the src code, it seems the HTable.close() will also close the HConnection, so one
table do a close, the following HTable have to reconnect, no shareing. But if the HTable is
initiated by the HConnection.getTable(), it will use a special constructor of HTable to make
sure when HTable.close() is invoked, it will NOT close the connection. So the HConnection
can be shared.

I will use the recommended method, and as discussed in another thread here, to share HConnection
one still have to ensure the shared connection should not be closed. So the HConnectionManager
is a good abstraction to control the life cycle of a connection. I seem to understand now


I am using HBase 0.98.6.

I learned from this maillist before, that the recommended method to 'connect' to HBase from
client is to use HConnectionManager like this:
                                HConnection con=HConnectionManager.createConnection(configuration);
                                HTableInterfacetable = con.getTable("hbase_table1"); Instead
                                HTableInterface table = new HTable(configuration, "hbase_table1");

I don't quite understand the reason. I was thinking that each time I initialize a HTable instance,
it needs to create a new HConnection. And that is expensive. But using the first method, multiple
HTable instances can share the same HConnection. That is quite reasonable to me.
However, I was reading from some articles on internet that , even if I use the 'new HTable(conf,
tbl)' method, if the 'conf' object is the same one, all the HTable instances will still share
the same HConnection. I was recently read yet another article and said when using 'new HTable(conf,
tbl)', one don't need to use the exactly same 'conf' object (same one in memory). if two 'conf'
objects, two different objects are all the same, I mean all attributes of these two are same
(for example, created from the same hbase-site.xml and never change) then HTable objects can
still share the same HConnection.  I also try to read the HTable src code, it is very hard,
but it seems to me the last statement is correct: 'HTable will share HConnection, if configuration
is all the same'.

Sorry for so verbose. My question:
If two 'configuration' objects are same, then two HTable object instantiated with them respectively
can still share the same HConnection or not? Directly using the 'new HTable()' method.
If the answer is 'yes', then why I still need the HConnectionManager to create a shared connection?
I am talking about 0.98.6.
I googled for days, and even try to read HBase src code, but still get really confused. I
try to do some tests also, but since I am too newbie, I don't know how to verify the difference,
I really don't know what a HConnection do under the hood. I counted the ZooKeeper client requests,
and I found some difference. If this ZooKeeper requests difference is a correct metrics, it
means to me that two HTable do not share HConnetion even using same 'configuration' in the
constructor. So it confused me more and more....

Please someone kindly help me for this newbie question and thanks in advance.


