hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <ey...@yahoo-inc.com>
Subject Re: modular build and pluggable rpc
Date Tue, 31 May 2011 04:55:24 GMT
Maven modulation could be enhanced to have a structure looks like this:

Super POM
  +- common
  +- shell
  +- master
  +- region-server
  +- coprocessor

The software is basically group by processor type (role of the process) and a shared library.

For RPC, there are several feasible options, avro, thrift and jackson+jersey (REST).  Avro
may seems cumbersome to define the schema in JSON string.  Thrift comes with it's own rpc
server, it is not trivial to add authorization and authentication to secure the rpc transport.
 Jackson+Jersey RPC message is biggest message size compare to Avro and thrift.  All three
frameworks have pros and cons but I think Jackson+jersey have the right balance for rpc framework.
 In most of the use case, pluggable RPC can be narrow down to two main category of use cases:

1. Freedom of creating most efficient rpc but hard to integrate with everything else because
it's custom made.
2. Being able to evolve message passing and versioning.

If we can see beyond first reason, and realize second reason is in part polymorphic serialization.
 This means, Jackson+Jersey is probably the better choice as a RPC framework because Jackson
supports polymorphic serialization, and Jersey builds on HTTP protocol.  It would be easier
to versioning and add security on top of existing standards.  The syntax and feature set seems
more engineering proper to me.

Regards,
Eric

On 5/27/11 7:11 PM, "Joey Echeverria" <joey@cloudera.com> wrote:

+1 on maven modules. That will simplify the native code build/integration
that I'm working on for HBASE-1316.

-Joey
On May 27, 2011 6:15 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
> The build modules are fine, I just wanted to voice my opinions on avro
> vs thrift. I dont think we should spend a lot of time attempting to
> build a avro vs thrift thing, we should plan to eventually move to
> thrift as our RPC serialization. I also concur with Todd, our server
> side code has had a lot of work and it isnt half bad now :-)
>
> +1 to maven modules, they are pretty cool
>
> On Fri, May 27, 2011 at 2:38 PM, Andrew Purtell <apurtell@apache.org>
wrote:
>> I don't disagree with any of this but the fact is we have compile time
differences if going against secure Hadoop 0.20 or non-secure Hadoop 0.20.
>>
>> So either we decide to punt on integration with secure Hadoop 0.20 or we
deal with the compile time differences. If dealing with them, we can do it
by reflection, which is brittle and can be difficult to understand and
debug, and someone would have to do this work; or we can wholesale replace
RPC with something based on Thrift, and someone would have to do the work;
or we take the pluggable RPC changes that Gary has already developed and
modularize the build, which Eric has already volunteered to do.
>>
>>  - Andy
>>
>> --- On Fri, 5/27/11, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> From: Todd Lipcon <todd@cloudera.com>
>>> Subject: Re: modular build and pluggable rpc
>>> To: dev@hbase.apache.org
>>> Cc: apurtell@apache.org
>>> Date: Friday, May 27, 2011, 1:30 PM
>>> Agreed - I'm all for Thrift.
>>>
>>> Though, I actually, contrary to Ryan, think that the
>>> existing HBaseRPC
>>> handler/client code is pretty good -- better than the
>>> equivalents from
>>> Thrift Java.
>>>
>>> We could start by using Thrift serialization on our
>>> existing transport
>>> -- then maybe work towards contributing it upstream to the
>>> Thrift
>>> project. HDFS folks are potentially interested in doing
>>> that as well.
>>>
>>> -Todd
>>>
>>> On Fri, May 27, 2011 at 1:10 PM, Ryan Rawson <ryanobjc@gmail.com>
>>> wrote:
>>> > I'm -1 on avro as a RPC format.  Thrift is the way to
>>> go, any of the
>>> > advantages of smaller serialization of avro is lost by
>>> the sheer
>>> > complexity of avro and therefore the potential bugs.
>>> >
>>> > I understand the desire to have a pluggable RPC
>>> engine, but it feels
>>> > like the better approach would be to adopt a unified
>>> RPC and just be
>>> > done with it.  I had a look at the HsHa mechanism in
>>> thrift and it is
>>> > very good, it in fact matches our 'handler' approach -
>>> async
>>> > recieving/sending of data, but single threaded for
>>> processing a
>>> > message.
>>> >
>>> > -ryan
>>> >
>>> > On Fri, May 27, 2011 at 1:00 PM, Andrew Purtell <apurtell@apache.org>
>>> wrote:
>>> >> Also needing, perhaps later, consideration:
>>> >>
>>> >> - HDFS-347 or not
>>> >>
>>> >>  - Lucene embedding for hbase-search, though as a
>>> coprocessor this is already pretty much handled if we have
>>> platform support (therefore a platform module) for a HDFS
>>> that can do local read shortcutting and block placement
>>> requests
>>> >>
>>> >> - HFile v1 versus v2
>>> >>
>>> >> Making decoupled development at several downstream
>>> sites manageable, with a home upstream for all the work,
>>> while simultaneously providing clean migration paths for
>>> users, basically.
>>> >>
>>> >> --- On Fri, 5/27/11, Andrew Purtell <apurtell@apache.org>
>>> wrote:
>>> >>
>>> >>> From: Andrew Purtell <apurtell@apache.org>
>>> >>> Subject: modular build and pluggable rpc
>>> >>> To: dev@hbase.apache.org
>>> >>> Date: Friday, May 27, 2011, 12:49 PM
>>> >>> From IRC:
>>> >>>
>>> >>> apurtell    i propose we take the build
>>> modular as early as possible to deal with multiple platform
>>> targets
>>> >>> apurtell    secure vs nonsecure
>>> >>> apurtell    0.20 vs 0.22 vs trunk
>>> >>> apurtell    i understand the maintenence
>>> issues with multiple rpc engines, for example, but a lot of
>>> reflection twistiness is going to be worse
>>> >>> apurtell    i propose we take up esammer on
>>> his offer
>>> >>> apurtell    so branch 0.92 asap, get trunk
>>> modular and working against multiple platform targets
>>> >>> apurtell    especially if we're going to
>>> see rpc changes coming from downstream projects...
>>> >>> apurtell    also what about supporting
>>> secure and nonsecure clients with the same deployment?
>>> >>> apurtell    zookeeper does this
>>> >>> apurtell    so that is selectable rpc
>>> engine per connection, with a negotiation
>>> >>> apurtell    we don't have or want to be
>>> crazy about it but a rolling upgrade should be possible if
>>> for example we are taking in a new rpc from fb (?) or
>>> cloudera (avro based?)
>>> >>> apurtell    also looks like hlog modules
>>> for 0.20 vs 0.22 and successors
>>> >>> apurtell    i think over time we can
>>> roadmap the rpc engines, if we have multiple, by
>>> deprecation
>>> >>> apurtell    now that we're on the edge of
>>> supporting both 0.20 and 0.22, and secure vs nonsecure,
>>> let's get it as manageable as possible right away
>>> >>>
>>> >>> St^Ack_        apurtell: +1
>>> >>>
>>> >>> apurtell    also i think there is some
>>> interest in async rpc engine
>>> >>>
>>> >>> St^Ack_        we should stick this up
>>> on dev i'd say
>>> >>>
>>> >>> Best regards,
>>> >>>
>>> >>>     - Andy
>>> >>>
>>> >>> Problems worthy of attack prove their worth by
>>> hitting
>>> >>> back. - Piet Hein (via Tom White)
>>> >>>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message