hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Whiting <je...@qualtrics.com>
Subject Re: HBase wire compatibility
Date Thu, 23 Feb 2012 21:40:59 GMT
Thanks for the explanation.  I enjoyed hearing your perspective.


On 2/22/2012 1:20 PM, tsuna wrote:
> On Thu, Feb 16, 2012 at 3:55 PM, Jeff Whiting<jeffw@qualtrics.com>  wrote:
>> It seems like the only heavy part of the client would be the zookeeper
>> interactions (forgive my ignorance if I'm wrong).
> ZooKeeper interactions are extremely simple for a client, that's not
> where the heavy part is.  All a client needs to do with ZooKeeper is
> to find where the -ROOT- region is, period.  In the client I wrote,
> asynchbase, I don't even maintain an open connection to ZooKeeper,
> because 99.99% of the time it's unnecessary.
>>   Other than zookeeper only
>> a basic understanding of regions need to be understood.  So if the zookeeper
>> interactions could be removed and pushed somewhere else in the stack that
>> could make the client much thinner.
> Using line count (per "wc -l") as a rough approximation of code
> complexity, here's a break down of asynchbase.  For a total of 11k
> lines the big chunks of code are:
> ZooKeeper code: 360 lines (not actually big but I included it for comparison)
> Code for handling NoSuchRegionException: 500 lines
> Helper code to deal with byte arrays: 500 lines
> Helper code to deal with HBase RPC serialization: 700 lines
> Code to batch RPCs: 800 lines
> Low-level socket code, and wire serialization/deserialization: 800 lines
> Code to open, manage, close scanners: 1000 lines
> Code for looking up and caching regions: 1000 lines
>> hopefully never again.  IMHO since you are redoing the communication why not
>> improve the protocol to allow for a leaner the client.  A leaner client
>> would be more likely to work across major hbase changes, would be easier to
>> maintain, would hide implementation details and could have less
>> dependencies.
> Yes a leaner client would be better.  But the reason the client is fat
> is because Bigtable's design pushed a lot of logic down to the clients
> in order to be able to make RPC routing decisions there, and relieve
> the tablet servers from having to do it.  When you start to have tens
> of thousands of clients talking to a cluster, like Google does, it
> makes sense to push this work down to the many clients, rather than
> have the fewer TabletServers do it and re-route packets (adding extra
> hops etc).  The overall system is more efficient this way.
> Leaner clients are better, but unfortunately lean clients are often
> dumb, so it's hard to find a good tradeoff between simplicity and
> efficiency.
>>   One of the reasons the client doesn't do well across major
>> changes is because of how heavy it is. Even if the client is never
>> implemented in another language a thinner client would seem to be an
>> improvement.
> Having maintained an HBase client written from scratch for about 2
> years now, I can tell you that the only things I had to fix across
> HBase release were wire-level serialization breakages.  The heavy
> logic of the client has remained mostly unchanged since the days of
> HBase 0.20.

Jeff Whiting
Qualtrics Senior Software Engineer

View raw message