hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Misra <pankaj.mi...@impetus.co.in>
Subject RE: Thrift Gateway Server, ZooKeeper & HBase
Date Mon, 01 Oct 2012 10:04:52 GMT
Thank you very much Harsh, thats extremely helpful and clears a lot of air for me.

Since I am running in a pseudo distributed mode, many things and mixed up, possibly going
to a small distributed setup will be better for me. Your note was very helpful around independent
scaling of Thrift servers. While the thrift servers would be run on machines having HBase
installs, as you indicated, they would also need the zookeeper connectivity since they would
be working via the Zookeeper for interaction with the HBase service nodes.

So, it boils down to the fact that while the Thrift server node may have the complete HBase
installs with the exact configurations for Zookeeper connectivity, these nodes will only be
running the thrift server and may not be running the HBase service if we only intend to scale
thrift server alone.

Thank you again Harsh for your prompt help, and please do feel free to indicate if my understanding
above is incorrect or incomplete. Thanks.

Thanks and Regards
Pankaj Misra


________________________________________
From: Harsh J [harsh@cloudera.com]
Sent: Monday, October 01, 2012 3:00 PM
To: user@hbase.apache.org
Subject: Re: Thrift Gateway Server, ZooKeeper & HBase

Hi,

Inline.

On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <pankaj.misra@impetus.co.in> wrote:
> Dear All,
>
> I would like to request your help for clearing some doubts that I have around the deployment
view of these components. I have been able to do some tests on my pseudo-distributed environment
and have been able to get very good throughput using Thrift client and gateway server. I need
your help to have a clear view of the deployment components, so that I can further elaborate
my environment with a clear thought process.
>
> Based on my recent experiences on gateway based connectivity using thrift to access hbase
regions, it occurs to me that in order to run a thrift server it has to be run on the hbase
node itself. I am trying to envision the deployment view in context of thrift gateway server
running on HBase node, ZooKeeper quorum and the HBase node themselves.

A thrift server needs connectivity to all HBase and ZK service/daemon
nodes, but does not need to be co-located with one.

> I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 0.23.1 natively
compiled and have installed the thrift library as per the installation instructions. I also
see that running gateway servers on HBase is a big plus for a highly multi-threaded environment
as it takes advantage of thread pooling. So since I am running my setup in a pseudo-distributed
mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN and 1 SNN.
>
> So if I have to illustrate my thinking here, the steps that I perform to have HBase running
with thrift gateway server are
> $HBASE_HOME/bin/start-hbase.sh                                         --> Starts
the HBase node,  Zookeeper Quorum & Region Server
> $HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift gateway server
on hbase node
>
> This makes me think that the thrift server is tightly coupled with every instance of
HBase node. If I just need to scale thrift server from a load balancing perspective, I cannot
do it independent of HBase scaling, I will have to add another HBase node in the cluster to
have another thrift server for scalability.

Do not couple library dependency with service dependency - both are
different things.

You may _install_ HBase libs on any machine connected to the cluster,
and start _just_ the thrift server on it. The HBase thrift server does
need HBase libraries to run, but does not need a local service to run
alongside.

> Also with the above scenario in mind, what seems to me is that the thrift server which
runs on HBase, requests zookeeper for the connection and zookeeper allocates and manages the
connection lifecycle via native Java objects (HTable & HTablePool) objects for respective
RegionServers based on key values. Based on my understanding, which may be incorrect, if thrift
server has to run on HBase node, which would also be running region servers as well, why the
calls have to go through the zookeeper? Or is it that once the client makes a successful connection
with a thrift server (on an Hbase node),  which may be initially mediated by Zookeeper for
allocation, the client interaction happens directly with the thrift server?

If a thrift client is used, the client will only talk to thrift
server. The client will not talk to ZooKeeper. The thrift server will
talk to ZooKeeper, HMaster and HRegionServers like a regular Java
client instead, and act as a 'gateway' for requests to thrift clients.

Does this help clear your questions?

--
Harsh J

________________________________

Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.

Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor Interoperable Systems’
available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.
Mime
View raw message