Mailing-List: contact drill-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: drill-dev@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of pconstantine@gmail.com
 designates 74.125.82.43 as permitted sender)
MIME-Version: 1.0
Sender: pconstantine@gmail.com
In-Reply-To: 
 <CAK+1EbhmWhx2WHRXJwiU=HeqfeXY62NsbznhoN7ab5TYz02K5Q@mail.gmail.com>
References: <4C785CAB-FD0E-4C5A-8D83-7AD0B7752139@gmail.com>
	<BAY156-W326B4FDE04EE4B610CCF80B2AA0@phx.gbl>
	<CAJwFCa1s==kcWiYtMJ25ODwLyH3EJzdXk_iHVDgsyhTK=9MYyg@mail.gmail.com>
	<CF9B7AAB-444C-4CE0-8CC9-C35F14491016@gmail.com>
	<CAJwFCa3_1_GVc_wMTpN=qZR=NZLbFDEQ79d70QK_uy-06zT7OQ@mail.gmail.com>
	<CANwqCk-QpTA8Gf0V6cSm-PnF_hpP-nHeCgeP8dgr6gqAPEUXaQ@mail.gmail.com>
	<CAMHgjMrzwOXRxE=AaFtxhOcgT7=VFjvCccRLQExNwyE5U_xPbw@mail.gmail.com>
	<CAK+1EbhkkNDedWu9cav6BYiLqJVqnRmOkdUfhDAONKEYRSJkDw@mail.gmail.com>
	<CAJwFCa12OHWZuz40BJU3yNySm5Z7PBW7oi+QO0h8qhwvBT=ZRA@mail.gmail.com>
	<CAK+1EbhmWhx2WHRXJwiU=HeqfeXY62NsbznhoN7ab5TYz02K5Q@mail.gmail.com>
Date: Sat, 15 Sep 2012 01:24:01 +0300
Message-ID: 
 <CAOEg9Lo8pwkZmsBjpGZrxLLY+X+QcJi0tgoipR05bY1yTJJ4TQ@mail.gmail.com>
Subject: Re: Thrift?
From: Constantine Peresypkin <constantine@litestack.com>
To: drill-dev@incubator.apache.org
Content-Type: multipart/alternative; boundary=e0cb4e7004ed83060d04c9b0e15d

--e0cb4e7004ed83060d04c9b0e15d
Content-Type: text/plain; charset=ISO-8859-1

Protobuf is an attempt to make ASN.1 more developer friendly (not a bad
attempt).
It's simpler, has much less features, easier to implement and has a compact
encoding.
But on other hand it's non-standard, "reinvented wheel" they could just do
a "better than PER" encoding for ASN.1, and AFAIK has no support for the
new and shiny Google encodings, like "group varint".
All in all in current situation it seems a better choice than ASN.1, not
even arguing about something even more vague and non-standard as Thrift.

On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Thanks for that Ted.
>
> Correct - internal wire format doesnt mean 'drill only supports
> protobuf encoded data'.
>
> Part of the reason to favor protobuf is that a lot of people in the
> broader 'big data' community are building a lot of experience with it.
>  Hadoop and HBase both are moving to/moved to protobuf on the wire.
> Being able to leverage this expertise is valuable.
>
> There is a JIRA in Hadoop-land where someone had done a deep dive
> 'bake off' between thrift, protobuf and avro.  The ultimate choice was
> protobuf for a number of reasons.  If people want to re-do the
> analysis, I'd like to see it in the context of THAT analysis (eg: why
> the assumptions there are not the same for Drill)... if anything it'd
> give a concrete form to what can be a mire.
>
> For what it's worth, I've had many discussion along these angles with
> a variety of people including committers on Thrift, and the consensus
> is both are good choices.
>
> -ryan
>
> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> > I think that it is important to ask a few questions leading up a decision
> > here.
> >
> > The first is a (rhetorical) show of hands about how many people believe
> > that there are no serious performance or expressivity killers when
> > comparing alternative serialization frameworks.  As far as I know,
> > performance differences are not massive (and protobufs is one of the
> > leaders in any case) and the expressivity differences are essentially
> nil.
> >  If somebody feels that there is a serious show-stopper with any option,
> > they should speak.
> >
> > The second is to ask the sense of the community whether they judge
> progress
> > or perfection in this decision is most important to the project.  My
> guess
> > is that almost everybody would prefer to see progress as long as the
> > technical choice is not subject to some horrid missing bit.
> >
> > The final question is whether it is reasonable to go along with protobufs
> > given that several very experienced engineers prefer it and would like to
> > produce code based on it.  If the first two answers are answered to the
> > effect of protobufs is about as good as we will find and that progress
> > trumps small differences, then it seems that moving to follow this
> > preference of Jason and Ryan for protobufs might be a reasonable thing to
> > do.
> >
> > The question of an internal wire format, btw, does not constrain the
> > project relative to external access.  I think it is important to support
> > JDBC and ODBC and whatever is in common use for querying.  For external
> > access the question is quite different.  Whereas for the internal format
> > consensus around a single choice has large benefits, the external format
> > choice is nearly the opposite.  For an external format, limiting
> ourselves
> > to a single choice seems like a bad idea and increasing the audience
> seems
> > like a better choice.
> >
> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> >
> >> Hi folks,
> >>
> >> I just commented on this first JIRA.  Here is my text:
> >>
> >> This issue has been hashed over a lot in the Hadoop projects. There
> >> was work done to compare thrift vs avro vs protobuf. The conclusion
> >> was protobuf was the decision to use.
> >>
> >> Prior to this move, there had been a lot of noise about pluggable RPC
> >> transports, and whatnot. It held up adoption of a backwards compatible
> >> serialization framework for a long time. The problem ended up being
> >> the analysis-paralysis, rather than the specific implementation
> >> problem. In other words, the problem was a LACK of implementation than
> >> actual REAL problems.
> >>
> >> Based on this experience, I'd strongly suggest adopting protobuf and
> >> moving on. Forget about pluggable RPC implementations, the complexity
> >> doesnt deliver benefits. The benefits of protobuf is that its the RPC
> >> format for Hadoop and HBase, which allows Drill to draw on the broad
> >> experience of those communities who need to implement high performance
> >> backwards compatible RPC serialization.
> >>
> >> ====
> >>
> >> Expanding a bit, I've looked in to this issue a lot, and there is very
> >> few significant concrete reasons to choose protobuf vs thrift.  Tiny
> >> percent faster of this, and that, etc.  I'd strongly suggest protobuf
> >> for the expanded community.  There is no particular Apache imperative
> >> that Apache projects re-use libraries.  Use what makes sense for your
> >> project.
> >>
> >> As regards to Avro, it's a fine serialization format for long term
> >> data retention, but the complexities that exist to enable that make it
> >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
> >>
> >> -ryan
> >>
> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <tshiran@maprtech.com>
> >> wrote:
> >> > We plan to propose the architecture and interfaces in the next couple
> >> > weeks, which will make it easy to divide the project into clear
> building
> >> > blocks. At that point it will be easier to start contributing
> different
> >> > data sources, data formats, operators, query languages, etc.
> >> >
> >> > The contributions are done in the usual Apache way. It's best to open
> a
> >> > JIRA and then post a patch so that others can review and then a
> committer
> >> > can check it in.
> >> >
> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> >> chandanmadhesia@gmail.com
> >> >> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> Hi
> >> >>
> >> >> What is the process to become a contributor to drill ?
> >> >>
> >> >> Regards
> >> >> chandan
> >> >>
> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <ted.dunning@gmail.com>
> >> wrote:
> >> >>
> >> >> > Suffice it to say that if *you* think it is important enough to
> >> implement
> >> >> > and maintain, then the group shouldn't say naye.  The consensus
> stuff
> >> >> > should only block things that break something else.  Additive
> features
> >> >> that
> >> >> > are highly maintainable (or which come with commitments) shouldn't
> >> >> > generally be blocked.
> >> >> >
> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
> >> >> > michael.hausenblas@gmail.com> wrote:
> >> >> >
> >> >> > > Good. Feel free to put me down for that, if the group as a whole
> >> thinks
> >> >> > > that (supporting Thrift) makes sense.
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Tomer Shiran
> >> > Director of Product Management | MapR Technologies | 650-804-8657
> >>
>

--e0cb4e7004ed83060d04c9b0e15d--