incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Constantine Peresypkin <pconstant...@gmail.com>
Subject Re: Thrift?
Date Fri, 14 Sep 2012 22:36:26 GMT
More battle tested than more than 20 year old standard used almost in every
telecom protocol that exists nowdays?
I think your statement is a little on "too bold" side. :)

On Sat, Sep 15, 2012 at 1:30 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Funny thing, given how much use protobufs has been put thru, I think
> one could make the argument its more battle tested than ASN.1 ...
>
> On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin
> <constantine@litestack.com> wrote:
> > Protobuf is an attempt to make ASN.1 more developer friendly (not a bad
> > attempt).
> > It's simpler, has much less features, easier to implement and has a
> compact
> > encoding.
> > But on other hand it's non-standard, "reinvented wheel" they could just
> do
> > a "better than PER" encoding for ASN.1, and AFAIK has no support for the
> > new and shiny Google encodings, like "group varint".
> > All in all in current situation it seems a better choice than ASN.1, not
> > even arguing about something even more vague and non-standard as Thrift.
> >
> > On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> >
> >> Thanks for that Ted.
> >>
> >> Correct - internal wire format doesnt mean 'drill only supports
> >> protobuf encoded data'.
> >>
> >> Part of the reason to favor protobuf is that a lot of people in the
> >> broader 'big data' community are building a lot of experience with it.
> >>  Hadoop and HBase both are moving to/moved to protobuf on the wire.
> >> Being able to leverage this expertise is valuable.
> >>
> >> There is a JIRA in Hadoop-land where someone had done a deep dive
> >> 'bake off' between thrift, protobuf and avro.  The ultimate choice was
> >> protobuf for a number of reasons.  If people want to re-do the
> >> analysis, I'd like to see it in the context of THAT analysis (eg: why
> >> the assumptions there are not the same for Drill)... if anything it'd
> >> give a concrete form to what can be a mire.
> >>
> >> For what it's worth, I've had many discussion along these angles with
> >> a variety of people including committers on Thrift, and the consensus
> >> is both are good choices.
> >>
> >> -ryan
> >>
> >> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <ted.dunning@gmail.com>
> >> wrote:
> >> > I think that it is important to ask a few questions leading up a
> decision
> >> > here.
> >> >
> >> > The first is a (rhetorical) show of hands about how many people
> believe
> >> > that there are no serious performance or expressivity killers when
> >> > comparing alternative serialization frameworks.  As far as I know,
> >> > performance differences are not massive (and protobufs is one of the
> >> > leaders in any case) and the expressivity differences are essentially
> >> nil.
> >> >  If somebody feels that there is a serious show-stopper with any
> option,
> >> > they should speak.
> >> >
> >> > The second is to ask the sense of the community whether they judge
> >> progress
> >> > or perfection in this decision is most important to the project.  My
> >> guess
> >> > is that almost everybody would prefer to see progress as long as the
> >> > technical choice is not subject to some horrid missing bit.
> >> >
> >> > The final question is whether it is reasonable to go along with
> protobufs
> >> > given that several very experienced engineers prefer it and would
> like to
> >> > produce code based on it.  If the first two answers are answered to
> the
> >> > effect of protobufs is about as good as we will find and that progress
> >> > trumps small differences, then it seems that moving to follow this
> >> > preference of Jason and Ryan for protobufs might be a reasonable
> thing to
> >> > do.
> >> >
> >> > The question of an internal wire format, btw, does not constrain the
> >> > project relative to external access.  I think it is important to
> support
> >> > JDBC and ODBC and whatever is in common use for querying.  For
> external
> >> > access the question is quite different.  Whereas for the internal
> format
> >> > consensus around a single choice has large benefits, the external
> format
> >> > choice is nearly the opposite.  For an external format, limiting
> >> ourselves
> >> > to a single choice seems like a bad idea and increasing the audience
> >> seems
> >> > like a better choice.
> >> >
> >> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
> >> wrote:
> >> >
> >> >> Hi folks,
> >> >>
> >> >> I just commented on this first JIRA.  Here is my text:
> >> >>
> >> >> This issue has been hashed over a lot in the Hadoop projects. There
> >> >> was work done to compare thrift vs avro vs protobuf. The conclusion
> >> >> was protobuf was the decision to use.
> >> >>
> >> >> Prior to this move, there had been a lot of noise about pluggable RPC
> >> >> transports, and whatnot. It held up adoption of a backwards
> compatible
> >> >> serialization framework for a long time. The problem ended up being
> >> >> the analysis-paralysis, rather than the specific implementation
> >> >> problem. In other words, the problem was a LACK of implementation
> than
> >> >> actual REAL problems.
> >> >>
> >> >> Based on this experience, I'd strongly suggest adopting protobuf and
> >> >> moving on. Forget about pluggable RPC implementations, the complexity
> >> >> doesnt deliver benefits. The benefits of protobuf is that its the RPC
> >> >> format for Hadoop and HBase, which allows Drill to draw on the broad
> >> >> experience of those communities who need to implement high
> performance
> >> >> backwards compatible RPC serialization.
> >> >>
> >> >> ====
> >> >>
> >> >> Expanding a bit, I've looked in to this issue a lot, and there is
> very
> >> >> few significant concrete reasons to choose protobuf vs thrift.  Tiny
> >> >> percent faster of this, and that, etc.  I'd strongly suggest protobuf
> >> >> for the expanded community.  There is no particular Apache imperative
> >> >> that Apache projects re-use libraries.  Use what makes sense for your
> >> >> project.
> >> >>
> >> >> As regards to Avro, it's a fine serialization format for long term
> >> >> data retention, but the complexities that exist to enable that make
> it
> >> >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
> >> >>
> >> >> -ryan
> >> >>
> >> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <tshiran@maprtech.com>
> >> >> wrote:
> >> >> > We plan to propose the architecture and interfaces in the next
> couple
> >> >> > weeks, which will make it easy to divide the project into clear
> >> building
> >> >> > blocks. At that point it will be easier to start contributing
> >> different
> >> >> > data sources, data formats, operators, query languages, etc.
> >> >> >
> >> >> > The contributions are done in the usual Apache way. It's best
to
> open
> >> a
> >> >> > JIRA and then post a patch so that others can review and then
a
> >> committer
> >> >> > can check it in.
> >> >> >
> >> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> >> >> chandanmadhesia@gmail.com
> >> >> >> wrote:
> >> >> >
> >> >> >> Hi
> >> >> >>
> >> >> >> Hi
> >> >> >>
> >> >> >> What is the process to become a contributor to drill ?
> >> >> >>
> >> >> >> Regards
> >> >> >> chandan
> >> >> >>
> >> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <
> ted.dunning@gmail.com>
> >> >> wrote:
> >> >> >>
> >> >> >> > Suffice it to say that if *you* think it is important
enough to
> >> >> implement
> >> >> >> > and maintain, then the group shouldn't say naye.  The
consensus
> >> stuff
> >> >> >> > should only block things that break something else. 
Additive
> >> features
> >> >> >> that
> >> >> >> > are highly maintainable (or which come with commitments)
> shouldn't
> >> >> >> > generally be blocked.
> >> >> >> >
> >> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
> >> >> >> > michael.hausenblas@gmail.com> wrote:
> >> >> >> >
> >> >> >> > > Good. Feel free to put me down for that, if the
group as a
> whole
> >> >> thinks
> >> >> >> > > that (supporting Thrift) makes sense.
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Tomer Shiran
> >> >> > Director of Product Management | MapR Technologies | 650-804-8657
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message