hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: modular build and pluggable rpc
Date Tue, 31 May 2011 20:42:15 GMT
The cost of serialization is non trivial and a substantial expense in
conveying information from regionserver -> client.  I did some
timings, and sending data across the wire is surprisingly slow, but
attempting to compress it with various compression systems ended up
taking 50-100ms on average case (1-5mb Result[] sets).

Originally when conceptualizing thrift, the thought was to just send
the KeyValue byte[] over thrift as an opaque blob and not doing a
whole structure thing, eg: no KeyValue structure with parts for each
of the parts of a KeyValue.  On large results that cost becomes
prohibitive.

While HTTP has a high overhead of headers, if one wanted to be
http-oriented you could do: http://www.chromium.org/spdy

The nice thing is that HTTP has a good set of interops and the like.
The bad thing is it is too verbose.

-ryan

On Tue, May 31, 2011 at 1:22 PM, Stack <stack@duboce.net> wrote:
> On Mon, May 30, 2011 at 9:55 PM, Eric Yang <eyang@yahoo-inc.com> wrote:
>> Maven modulation could be enhanced to have a structure looks like this:
>>
>> Super POM
>>  +- common
>>  +- shell
>>  +- master
>>  +- region-server
>>  +- coprocessor
>>
>> The software is basically group by processor type (role of the process) and a shared
library.
>>
>
> I'd change the list above.  shell should be client and perhaps master
> and regionserver should be both inside a single 'server' submodule.
> We need to add security in there.  Perhaps we'd have a submodule for
> thrift, avro, rest (and perhaps rest war file)?  (Is this too many
> submodules  -- I suppose once we are submodularized, adding new ones
> is trivial.  Its the initial move to submodules that is painful)
>
>
>> For RPC, there are several feasible options, avro, thrift and jackson+jersey (REST).
 Avro may seems cumbersome to define the schema in JSON string.  Thrift comes with it's
own rpc server, it is not trivial to add authorization and authentication to secure the rpc
transport.  Jackson+Jersey RPC message is biggest message size compare to Avro and thrift.
 All three frameworks have pros and cons but I think Jackson+jersey have the right balance
for rpc framework.  In most of the use case, pluggable RPC can be narrow down to two main
category of use cases:
>>
>> 1. Freedom of creating most efficient rpc but hard to integrate with everything else
because it's custom made.
>> 2. Being able to evolve message passing and versioning.
>>
>> If we can see beyond first reason, and realize second reason is in part polymorphic
serialization.  This means, Jackson+Jersey is probably the better choice as a RPC framework
because Jackson supports polymorphic serialization, and Jersey builds on HTTP protocol.  It
would be easier to versioning and add security on top of existing standards.  The syntax
and feature set seems more engineering proper to me.
>>
>
> I always considered http attactive but much too heavy-weight for hbase
> rpc; each request/response would carry a bunch of what are for the
> most part extraneous headers.  I suppose we should just measure.
> Regards JSON messages, thats interesting but hbase is all about binary
> data.  Does jackson/jersey do BSON?
>
> St.Ack
>

Mime
View raw message