incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Constantine Peresypkin <pconstant...@gmail.com>
Subject Re: Thrift?
Date Fri, 14 Sep 2012 22:59:07 GMT
I really have no idea how one can estimate telco traffic.
But I highly doubt that you can fruitfully compare reliability of
internal-only protocol (same implementation, easy to enforce compatibility)
to an interoperable one.

On Sat, Sep 15, 2012 at 1:41 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> I didn't say I was the one making the argument...
>
> Google has put probably > 10^24 bytes of data thru protobuf in
> multiple implementations (eg: serialization on disk and on wire RPC).
>   That is a low estimate.
>
> I'd be interested in hearing what 20 years of telco protocol traffic
> might compare to 10 years of google's usage of protobuf.  Exponential
> curve and all of that.
>
>
>
>
>
> On Fri, Sep 14, 2012 at 3:36 PM, Constantine Peresypkin
> <pconstantine@gmail.com> wrote:
> > More battle tested than more than 20 year old standard used almost in
> every
> > telecom protocol that exists nowdays?
> > I think your statement is a little on "too bold" side. :)
> >
> > On Sat, Sep 15, 2012 at 1:30 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> >
> >> Funny thing, given how much use protobufs has been put thru, I think
> >> one could make the argument its more battle tested than ASN.1 ...
> >>
> >> On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin
> >> <constantine@litestack.com> wrote:
> >> > Protobuf is an attempt to make ASN.1 more developer friendly (not a
> bad
> >> > attempt).
> >> > It's simpler, has much less features, easier to implement and has a
> >> compact
> >> > encoding.
> >> > But on other hand it's non-standard, "reinvented wheel" they could
> just
> >> do
> >> > a "better than PER" encoding for ASN.1, and AFAIK has no support for
> the
> >> > new and shiny Google encodings, like "group varint".
> >> > All in all in current situation it seems a better choice than ASN.1,
> not
> >> > even arguing about something even more vague and non-standard as
> Thrift.
> >> >
> >> > On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <ryanobjc@gmail.com>
> >> wrote:
> >> >
> >> >> Thanks for that Ted.
> >> >>
> >> >> Correct - internal wire format doesnt mean 'drill only supports
> >> >> protobuf encoded data'.
> >> >>
> >> >> Part of the reason to favor protobuf is that a lot of people in the
> >> >> broader 'big data' community are building a lot of experience with
> it.
> >> >>  Hadoop and HBase both are moving to/moved to protobuf on the wire.
> >> >> Being able to leverage this expertise is valuable.
> >> >>
> >> >> There is a JIRA in Hadoop-land where someone had done a deep dive
> >> >> 'bake off' between thrift, protobuf and avro.  The ultimate choice
> was
> >> >> protobuf for a number of reasons.  If people want to re-do the
> >> >> analysis, I'd like to see it in the context of THAT analysis (eg: why
> >> >> the assumptions there are not the same for Drill)... if anything it'd
> >> >> give a concrete form to what can be a mire.
> >> >>
> >> >> For what it's worth, I've had many discussion along these angles with
> >> >> a variety of people including committers on Thrift, and the consensus
> >> >> is both are good choices.
> >> >>
> >> >> -ryan
> >> >>
> >> >> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <ted.dunning@gmail.com>
> >> >> wrote:
> >> >> > I think that it is important to ask a few questions leading up
a
> >> decision
> >> >> > here.
> >> >> >
> >> >> > The first is a (rhetorical) show of hands about how many people
> >> believe
> >> >> > that there are no serious performance or expressivity killers
when
> >> >> > comparing alternative serialization frameworks.  As far as I know,
> >> >> > performance differences are not massive (and protobufs is one
of
> the
> >> >> > leaders in any case) and the expressivity differences are
> essentially
> >> >> nil.
> >> >> >  If somebody feels that there is a serious show-stopper with any
> >> option,
> >> >> > they should speak.
> >> >> >
> >> >> > The second is to ask the sense of the community whether they judge
> >> >> progress
> >> >> > or perfection in this decision is most important to the project.
>  My
> >> >> guess
> >> >> > is that almost everybody would prefer to see progress as long
as
> the
> >> >> > technical choice is not subject to some horrid missing bit.
> >> >> >
> >> >> > The final question is whether it is reasonable to go along with
> >> protobufs
> >> >> > given that several very experienced engineers prefer it and would
> >> like to
> >> >> > produce code based on it.  If the first two answers are answered
to
> >> the
> >> >> > effect of protobufs is about as good as we will find and that
> progress
> >> >> > trumps small differences, then it seems that moving to follow
this
> >> >> > preference of Jason and Ryan for protobufs might be a reasonable
> >> thing to
> >> >> > do.
> >> >> >
> >> >> > The question of an internal wire format, btw, does not constrain
> the
> >> >> > project relative to external access.  I think it is important
to
> >> support
> >> >> > JDBC and ODBC and whatever is in common use for querying.  For
> >> external
> >> >> > access the question is quite different.  Whereas for the internal
> >> format
> >> >> > consensus around a single choice has large benefits, the external
> >> format
> >> >> > choice is nearly the opposite.  For an external format, limiting
> >> >> ourselves
> >> >> > to a single choice seems like a bad idea and increasing the
> audience
> >> >> seems
> >> >> > like a better choice.
> >> >> >
> >> >> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Hi folks,
> >> >> >>
> >> >> >> I just commented on this first JIRA.  Here is my text:
> >> >> >>
> >> >> >> This issue has been hashed over a lot in the Hadoop projects.
> There
> >> >> >> was work done to compare thrift vs avro vs protobuf. The
> conclusion
> >> >> >> was protobuf was the decision to use.
> >> >> >>
> >> >> >> Prior to this move, there had been a lot of noise about pluggable
> RPC
> >> >> >> transports, and whatnot. It held up adoption of a backwards
> >> compatible
> >> >> >> serialization framework for a long time. The problem ended
up
> being
> >> >> >> the analysis-paralysis, rather than the specific implementation
> >> >> >> problem. In other words, the problem was a LACK of implementation
> >> than
> >> >> >> actual REAL problems.
> >> >> >>
> >> >> >> Based on this experience, I'd strongly suggest adopting protobuf
> and
> >> >> >> moving on. Forget about pluggable RPC implementations, the
> complexity
> >> >> >> doesnt deliver benefits. The benefits of protobuf is that
its the
> RPC
> >> >> >> format for Hadoop and HBase, which allows Drill to draw on
the
> broad
> >> >> >> experience of those communities who need to implement high
> >> performance
> >> >> >> backwards compatible RPC serialization.
> >> >> >>
> >> >> >> ====
> >> >> >>
> >> >> >> Expanding a bit, I've looked in to this issue a lot, and there
is
> >> very
> >> >> >> few significant concrete reasons to choose protobuf vs thrift.
>  Tiny
> >> >> >> percent faster of this, and that, etc.  I'd strongly suggest
> protobuf
> >> >> >> for the expanded community.  There is no particular Apache
> imperative
> >> >> >> that Apache projects re-use libraries.  Use what makes sense
for
> your
> >> >> >> project.
> >> >> >>
> >> >> >> As regards to Avro, it's a fine serialization format for long
term
> >> >> >> data retention, but the complexities that exist to enable
that
> make
> >> it
> >> >> >> non-ideal for an RPC.  I know of no one who uses AvroRPC in
any
> form.
> >> >> >>
> >> >> >> -ryan
> >> >> >>
> >> >> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <
> tshiran@maprtech.com>
> >> >> >> wrote:
> >> >> >> > We plan to propose the architecture and interfaces in
the next
> >> couple
> >> >> >> > weeks, which will make it easy to divide the project
into clear
> >> >> building
> >> >> >> > blocks. At that point it will be easier to start contributing
> >> >> different
> >> >> >> > data sources, data formats, operators, query languages,
etc.
> >> >> >> >
> >> >> >> > The contributions are done in the usual Apache way. It's
best to
> >> open
> >> >> a
> >> >> >> > JIRA and then post a patch so that others can review
and then a
> >> >> committer
> >> >> >> > can check it in.
> >> >> >> >
> >> >> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> >> >> >> chandanmadhesia@gmail.com
> >> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Hi
> >> >> >> >>
> >> >> >> >> Hi
> >> >> >> >>
> >> >> >> >> What is the process to become a contributor to drill
?
> >> >> >> >>
> >> >> >> >> Regards
> >> >> >> >> chandan
> >> >> >> >>
> >> >> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <
> >> ted.dunning@gmail.com>
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Suffice it to say that if *you* think it is
important enough
> to
> >> >> >> implement
> >> >> >> >> > and maintain, then the group shouldn't say naye.
 The
> consensus
> >> >> stuff
> >> >> >> >> > should only block things that break something
else.  Additive
> >> >> features
> >> >> >> >> that
> >> >> >> >> > are highly maintainable (or which come with
commitments)
> >> shouldn't
> >> >> >> >> > generally be blocked.
> >> >> >> >> >
> >> >> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas
<
> >> >> >> >> > michael.hausenblas@gmail.com> wrote:
> >> >> >> >> >
> >> >> >> >> > > Good. Feel free to put me down for that,
if the group as a
> >> whole
> >> >> >> thinks
> >> >> >> >> > > that (supporting Thrift) makes sense.
> >> >> >> >> > >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Tomer Shiran
> >> >> >> > Director of Product Management | MapR Technologies |
> 650-804-8657
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message