hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Dagaev" <michael.dag...@gmail.com>
Subject Re: HTables Pool
Date Mon, 15 Dec 2008 19:28:37 GMT

    Could you explain, please, why the Hbase PRC has such a
limitation? As I remember, Hbase uses Thrift as an PRC mechanism. Is
it a Thrift limitation? Can it be changed?

It is also interesting why Hbase uses Thrift rather than Java RMI.  I
can guess that RMI is not so good at bulk data transfer but I do not
know it for sure.

Thank you for your cooperation,

On Mon, Dec 15, 2008 at 6:23 PM, stack <stack@duboce.net> wrote:
> Sorry Slava, 'RPC Lock' was misinformation on our part.  Subsequent digging
> turned up the fact that RPC has a pool of Connections, one to each remote
> host.  Send and receive on this single connection is synchronous but
> otherwise, Connection is idle.  Primitive testing had it that there is a
> benefit to having multiple HTable instances.  As to how much, we have yet to
> ascertain and at high numbers of HTable, there'd start to be contention over
> the single Connection (or if amount of data being passed was large).
> We'll post when we have a better story than the above,
> Good stuff,
> St.Ack
> Slava Gorelik wrote:
>> As far as i know the HTable itself has connection pool (HConnectionManager
>> is singleton).I think, multiple instances of HTable within same
>> application
>> will not help you.
>> You better try to use multiple process instead of multiple threads.
>> You can search the mailing list archive, i asked almost same question.
>> Current HBase client implementation has some RPC Lock, i.e.
>> multi-threading
>> is not useful.
>> Best Regards.
>> On Mon, Dec 15, 2008 at 12:23 PM, Michael Dagaev
>> <michael.dagaev@gmail.com>wrote:
>>> Hi, all
>>>   Currently, we are using a single instance of HTable in a
>>> multithreaded application. That is, several threads use the same
>>> instance of HTable to insert data in the database. Since method
>>> "commit" of HTable is synchronized, we are afraid that the single
>>> instance of HTable can be a bottle neck. So, we are going to create a
>>> pool of HTable instances (all instances are created with the same
>>> table name) and use the instances simultaneously (an instance per
>>> thread).
>>> Does it make sense?
>>> Thank you for your cooperation,
>>> M.

View raw message