hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Thrift Gateway Server, ZooKeeper & HBase
Date Mon, 01 Oct 2012 10:06:42 GMT
You've got it right, Pankaj :)

On Mon, Oct 1, 2012 at 3:34 PM, Pankaj Misra <pankaj.misra@impetus.co.in> wrote:
> Thank you very much Harsh, thats extremely helpful and clears a lot of air for me.
> Since I am running in a pseudo distributed mode, many things and mixed up, possibly going
to a small distributed setup will be better for me. Your note was very helpful around independent
scaling of Thrift servers. While the thrift servers would be run on machines having HBase
installs, as you indicated, they would also need the zookeeper connectivity since they would
be working via the Zookeeper for interaction with the HBase service nodes.
> So, it boils down to the fact that while the Thrift server node may have the complete
HBase installs with the exact configurations for Zookeeper connectivity, these nodes will
only be running the thrift server and may not be running the HBase service if we only intend
to scale thrift server alone.
> Thank you again Harsh for your prompt help, and please do feel free to indicate if my
understanding above is incorrect or incomplete. Thanks.
> Thanks and Regards
> Pankaj Misra
> ________________________________________
> From: Harsh J [harsh@cloudera.com]
> Sent: Monday, October 01, 2012 3:00 PM
> To: user@hbase.apache.org
> Subject: Re: Thrift Gateway Server, ZooKeeper & HBase
> Hi,
> Inline.
> On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <pankaj.misra@impetus.co.in> wrote:
>> Dear All,
>> I would like to request your help for clearing some doubts that I have around the
deployment view of these components. I have been able to do some tests on my pseudo-distributed
environment and have been able to get very good throughput using Thrift client and gateway
server. I need your help to have a clear view of the deployment components, so that I can
further elaborate my environment with a clear thought process.
>> Based on my recent experiences on gateway based connectivity using thrift to access
hbase regions, it occurs to me that in order to run a thrift server it has to be run on the
hbase node itself. I am trying to envision the deployment view in context of thrift gateway
server running on HBase node, ZooKeeper quorum and the HBase node themselves.
> A thrift server needs connectivity to all HBase and ZK service/daemon
> nodes, but does not need to be co-located with one.
>> I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 0.23.1
natively compiled and have installed the thrift library as per the installation instructions.
I also see that running gateway servers on HBase is a big plus for a highly multi-threaded
environment as it takes advantage of thread pooling. So since I am running my setup in a pseudo-distributed
mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN and 1 SNN.
>> So if I have to illustrate my thinking here, the steps that I perform to have HBase
running with thrift gateway server are
>> $HBASE_HOME/bin/start-hbase.sh                                         --> Starts
the HBase node,  Zookeeper Quorum & Region Server
>> $HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift gateway
server on hbase node
>> This makes me think that the thrift server is tightly coupled with every instance
of HBase node. If I just need to scale thrift server from a load balancing perspective, I
cannot do it independent of HBase scaling, I will have to add another HBase node in the cluster
to have another thrift server for scalability.
> Do not couple library dependency with service dependency - both are
> different things.
> You may _install_ HBase libs on any machine connected to the cluster,
> and start _just_ the thrift server on it. The HBase thrift server does
> need HBase libraries to run, but does not need a local service to run
> alongside.
>> Also with the above scenario in mind, what seems to me is that the thrift server
which runs on HBase, requests zookeeper for the connection and zookeeper allocates and manages
the connection lifecycle via native Java objects (HTable & HTablePool) objects for respective
RegionServers based on key values. Based on my understanding, which may be incorrect, if thrift
server has to run on HBase node, which would also be running region servers as well, why the
calls have to go through the zookeeper? Or is it that once the client makes a successful connection
with a thrift server (on an Hbase node),  which may be initially mediated by Zookeeper for
allocation, the client interaction happens directly with the thrift server?
> If a thrift client is used, the client will only talk to thrift
> server. The client will not talk to ZooKeeper. The thrift server will
> talk to ZooKeeper, HMaster and HRegionServers like a regular Java
> client instead, and act as a 'gateway' for requests to thrift clients.
> Does this help clear your questions?
> --
> Harsh J
> ________________________________
> Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
> Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable
Systems’ available at http://lf1.me/0E/.
> NOTE: This message may contain information that is confidential, proprietary, privileged
or otherwise protected by law. The message is intended solely for the named addressee. If
received in error, please destroy and notify the sender. Any use of this email is prohibited
when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity
of this communication has been maintained nor that the communication is free of errors, virus,
interception or interference.

Harsh J

View raw message