hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: HTable thread safety in 0.20.6
Date Mon, 07 Mar 2011 06:17:23 GMT
So when you look at the interface that the client uses to talk to the
regionservers it has calls like this:

  public <R> MultiResponse multi(MultiAction<R> multi) throws IOException;
  public long openScanner(final byte [] regionName, final Scan scan)
  throws IOException;

etc

Note that this is the interface you get _AFTER_ you are talking to a
particular regionserver.  If you send a regionName that is not being
served you get a 'region not served' exception.

In other words a blind client wouldnt know which servers to talk to.
You have to first:
- bootstrap the ROOT table region server location from ZK (there is
only 1, always will only be one)
- get the META region(s) location(s).
- query the META region(s) to find out which server contains the
region for the specific request.
- talk to the individual regionserver. If you get exceptions, do the
lookup in META again and try again.

Putting these smarts in the client makes it scalable, at the cost of a
thicker client.

To make an API that has a '1 shot' type of interface, we'd end up
creating something that looks like the thrift gateway.  But now you
have bottlenecks in the thrift gateway servers.

There really is no free lunch. Sorry.


On Sun, Mar 6, 2011 at 10:09 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
> Sorry - missed the user group in my previous mail.
> --Suraj
>
> On Sun, Mar 6, 2011 at 10:07 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
>
>> Very interesting.
>> I was just about to send an additional mail asking why HBase client also
>> needs the hadoop jar (thereby tying the client onto the hadoop version as
>> well) - but, I guess at the least the hadoop rpc is the dependency. So, now
>> that makes sense.
>>
>>
>> > One strategy is to deploy gateways on all client nodes and use localhost
>> > as much as possible.
>>
>> This certainly scales up the gateway nodes - but complicates the upgrades.
>> For instance, we will have a 100+ clients talking to the cluster and
>> upgrading from 0.20.x to 0.90.x would be that much harder with version
>> specific gateway nodes all over the place.
>>
>> > So again avro isn't going to be a magic bullet. Neither thrift.
>> This is interesting (disappointing?) ... isn't the plan to substitute
>> hadoop rpc with avro (or thrift) while still keeping all the smart logic in
>> the client in place? I thought avro with its cross-version capabilities
>> would have solved the versioning issues and allowed the backward/forward
>> compatibility. I mean, a "thick" client talking avro was what I had imagined
>> the solution to be.
>>
>> Glad to know that client compatibility is very much in the commiter's /
>> community's mind.
>>
>> Based on discussion below, is async-hbase a "thick" / smart client or
>> something less than that?
>> >> 2) Does asynchbase have any limitations (functionally or otherwise)
>> compared
>> >> to the native HBase client?
>>
>> Thanks again.
>> --Suraj
>>
>>
>> On Sun, Mar 6, 2011 at 9:40 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>
>>> On Sun, Mar 6, 2011 at 9:25 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
>>> > Thanks all for your insights into this.
>>> >
>>> > I would agree that providing mechanisms to support no-outage upgrades
>>> going
>>> > forward would really be widely beneficial. I was looking forward to Avro
>>> for
>>> > this reason.
>>> >
>>> > Some follow up questions:
>>> > 1) If asynchbase client to do this (i.e. talk wire protocol and adjust
>>> based
>>> > on server versions), why not the native hbase client? Is there something
>>> in
>>> > the native client design that would make this too hard / not worth
>>> > emulating?
>>>
>>> Typically this has not been an issue.  The particular design of the
>>> way that hadoop rpc (the rpc we use) makes it difficult to offer
>>> multiple protocol/version support. To "fix it" would more or less
>>> require rewriting the entire protocol stack. I'm glad we spent serious
>>> time making the base storage layer and query paths fast, since without
>>> those fundamentals a "better" RPC would be moot. From my measurements
>>> I dont think we are losing a lot of performance in our current RPC
>>> system, and unless we are very careful we'll lose a lot in a
>>> thrift/avro transition.
>>>
>>>
>>> > 2) Does asynchbase have any limitations (functionally or otherwise)
>>> compared
>>> > to the native HBase client?
>>> >
>>> > 3) If Avro were the "native" protocol that HBase & client talks through,
>>> > that is one thing (and that's what I'm hoping we end up with) - however,
>>> > isn't spinning up Avro gateways on each node (like what is currently
>>> > available) require folks to scale up two layers (Avro gateway layer +
>>> HBase
>>> > layer)? i.e. now we need to be worried about whether the Avro gateways
>>> can
>>> > handle the traffic, etc.
>>>
>>> The hbase client is fairly 'thick', it must intelligently route
>>> between different regionservers, handle errors, relook up meta data,
>>> use zookeeper to bootstrap, etc. This is part of making a scalable
>>> client though. Having the RPC serialization in thrift or avro would
>>> make it easier to write those kinds of clients for non-Java languages.
>>> The gateway approach will probably be necessary for a while alas. At
>>> SU I am not sure that the gateway is adding a lot of of latency to
>>> small queries, since average/median latency is around 1ms.  One
>>> strategy is to deploy gateways on all client nodes and use localhost
>>> as much as possible.
>>>
>>> > In our application, we have Java clients talking directly to HBase. We
>>> > debated using Thrift or Stargate layer (even though we have a Java
>>> client)
>>> > just because of this easier upgrade-ability. But we finally decided to
>>> use
>>> > the native HBase client because we didn't want to have to scale two
>>> layers
>>> > rather than just HBase ... and Avro was on the road map. An HBase client
>>> > talking native Avro directly to RS (i.e. without intermediate "gateways"
>>> > would have worked - but that was a ways ...
>>>
>>> So again avro isn't going to be a magic bullet. Neither thrift.  You
>>> can't just have a dumb client with little logic open up a socket and
>>> start talking to HBase.  That isn't congruent with a scalable system
>>> unfortunately. You need your clients to be smart and do a bunch of
>>> work that otherwise would have to be done by a centralized type node
>>> or another middleman. Only if the client is smart can we send the
>>> minimal RPCs to the shortest network length. Other systems have
>>> servers bounce the requests to other servers but that can promote
>>> extra traffic at the cost of a simpler client.
>>>
>>> > I think now that we are in the .90s, an option to do no-outage upgrades
>>> > (from client's perspective) would be really beneficial.
>>>
>>> We'd all like this, it's formost in pretty much every committer's mind
>>> all the time. It's just a HUGE body of work. One that is fraught with
>>> perils and danger zones. For example it seemed avro would reign
>>> supreme, but the RPC landscape is shifting back towards thrift.
>>>
>>> >
>>> > Thanks,
>>> > --Suraj
>>> >
>>> >
>>> > On Sat, Mar 5, 2011 at 2:21 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>> >
>>> >> On Sat, Mar 5, 2011 at 2:10 PM, Ryan Rawson <ryanobjc@gmail.com>
>>> wrote:
>>> >> > As for the past RPC, it's all well to complain that we didn't spend
>>> >> > more time making it more compatible, but in a world where evolving
>>> >> > features in an early platform is more important than keeping
>>> backwards
>>> >> > compatibility (how many hbase 18 jars want to talk to a modern
>>> >> > cluster? Like none.), I am confident we did the right choice.  Moving
>>> >> > forward I think the goal should NOT be to maintain the current
system
>>> >> > compatible at all costs, but to look at things like avro and thrift,
>>> >> > make a calculated engineering tradeoff and get ourselves on to
a
>>> >> > extendable platform, even if there is a flag day.  We aren't out
of
>>> >> > the woods yet, but eventually we will be.
>>> >>
>>> >> Hear hear! +1!
>>> >>
>>> >> -Todd
>>> >> --
>>> >> Todd Lipcon
>>> >> Software Engineer, Cloudera
>>> >>
>>> >
>>>
>>
>>
>

Mime
View raw message