incubator-drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clark Yang (杨卓荦) <yangzhuo...@gmail.com>
Subject Re: Thrift?
Date Sat, 15 Sep 2012 01:56:33 GMT
protobuf +1
I don't think it is a standard problem. protobuf has already shown a great
many benefits and success in many open source projects. It's widely used
and few better alternative, I think.

BTW, I have posted the first comment of the first jira.

Cheers,
Zhuoluo (Clark) Yang



2012/9/15 Constantine Peresypkin <pconstantine@gmail.com>

> I really have no idea how one can estimate telco traffic.
> But I highly doubt that you can fruitfully compare reliability of
> internal-only protocol (same implementation, easy to enforce compatibility)
> to an interoperable one.
>
> On Sat, Sep 15, 2012 at 1:41 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
> > I didn't say I was the one making the argument...
> >
> > Google has put probably > 10^24 bytes of data thru protobuf in
> > multiple implementations (eg: serialization on disk and on wire RPC).
> >   That is a low estimate.
> >
> > I'd be interested in hearing what 20 years of telco protocol traffic
> > might compare to 10 years of google's usage of protobuf.  Exponential
> > curve and all of that.
> >
> >
> >
> >
> >
> > On Fri, Sep 14, 2012 at 3:36 PM, Constantine Peresypkin
> > <pconstantine@gmail.com> wrote:
> > > More battle tested than more than 20 year old standard used almost in
> > every
> > > telecom protocol that exists nowdays?
> > > I think your statement is a little on "too bold" side. :)
> > >
> > > On Sat, Sep 15, 2012 at 1:30 AM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> > >
> > >> Funny thing, given how much use protobufs has been put thru, I think
> > >> one could make the argument its more battle tested than ASN.1 ...
> > >>
> > >> On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin
> > >> <constantine@litestack.com> wrote:
> > >> > Protobuf is an attempt to make ASN.1 more developer friendly (not
a
> > bad
> > >> > attempt).
> > >> > It's simpler, has much less features, easier to implement and has
a
> > >> compact
> > >> > encoding.
> > >> > But on other hand it's non-standard, "reinvented wheel" they could
> > just
> > >> do
> > >> > a "better than PER" encoding for ASN.1, and AFAIK has no support for
> > the
> > >> > new and shiny Google encodings, like "group varint".
> > >> > All in all in current situation it seems a better choice than ASN.1,
> > not
> > >> > even arguing about something even more vague and non-standard as
> > Thrift.
> > >> >
> > >> > On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <ryanobjc@gmail.com>
> > >> wrote:
> > >> >
> > >> >> Thanks for that Ted.
> > >> >>
> > >> >> Correct - internal wire format doesnt mean 'drill only supports
> > >> >> protobuf encoded data'.
> > >> >>
> > >> >> Part of the reason to favor protobuf is that a lot of people in
the
> > >> >> broader 'big data' community are building a lot of experience
with
> > it.
> > >> >>  Hadoop and HBase both are moving to/moved to protobuf on the
wire.
> > >> >> Being able to leverage this expertise is valuable.
> > >> >>
> > >> >> There is a JIRA in Hadoop-land where someone had done a deep dive
> > >> >> 'bake off' between thrift, protobuf and avro.  The ultimate choice
> > was
> > >> >> protobuf for a number of reasons.  If people want to re-do the
> > >> >> analysis, I'd like to see it in the context of THAT analysis (eg:
> why
> > >> >> the assumptions there are not the same for Drill)... if anything
> it'd
> > >> >> give a concrete form to what can be a mire.
> > >> >>
> > >> >> For what it's worth, I've had many discussion along these angles
> with
> > >> >> a variety of people including committers on Thrift, and the
> consensus
> > >> >> is both are good choices.
> > >> >>
> > >> >> -ryan
> > >> >>
> > >> >> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <
> ted.dunning@gmail.com>
> > >> >> wrote:
> > >> >> > I think that it is important to ask a few questions leading
up a
> > >> decision
> > >> >> > here.
> > >> >> >
> > >> >> > The first is a (rhetorical) show of hands about how many
people
> > >> believe
> > >> >> > that there are no serious performance or expressivity killers
> when
> > >> >> > comparing alternative serialization frameworks.  As far as
I
> know,
> > >> >> > performance differences are not massive (and protobufs is
one of
> > the
> > >> >> > leaders in any case) and the expressivity differences are
> > essentially
> > >> >> nil.
> > >> >> >  If somebody feels that there is a serious show-stopper with
any
> > >> option,
> > >> >> > they should speak.
> > >> >> >
> > >> >> > The second is to ask the sense of the community whether they
> judge
> > >> >> progress
> > >> >> > or perfection in this decision is most important to the project.
> >  My
> > >> >> guess
> > >> >> > is that almost everybody would prefer to see progress as
long as
> > the
> > >> >> > technical choice is not subject to some horrid missing bit.
> > >> >> >
> > >> >> > The final question is whether it is reasonable to go along
with
> > >> protobufs
> > >> >> > given that several very experienced engineers prefer it and
would
> > >> like to
> > >> >> > produce code based on it.  If the first two answers are answered
> to
> > >> the
> > >> >> > effect of protobufs is about as good as we will find and
that
> > progress
> > >> >> > trumps small differences, then it seems that moving to follow
> this
> > >> >> > preference of Jason and Ryan for protobufs might be a reasonable
> > >> thing to
> > >> >> > do.
> > >> >> >
> > >> >> > The question of an internal wire format, btw, does not constrain
> > the
> > >> >> > project relative to external access.  I think it is important
to
> > >> support
> > >> >> > JDBC and ODBC and whatever is in common use for querying.
 For
> > >> external
> > >> >> > access the question is quite different.  Whereas for the
internal
> > >> format
> > >> >> > consensus around a single choice has large benefits, the
external
> > >> format
> > >> >> > choice is nearly the opposite.  For an external format, limiting
> > >> >> ourselves
> > >> >> > to a single choice seems like a bad idea and increasing the
> > audience
> > >> >> seems
> > >> >> > like a better choice.
> > >> >> >
> > >> >> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <
> ryanobjc@gmail.com>
> > >> >> wrote:
> > >> >> >
> > >> >> >> Hi folks,
> > >> >> >>
> > >> >> >> I just commented on this first JIRA.  Here is my text:
> > >> >> >>
> > >> >> >> This issue has been hashed over a lot in the Hadoop projects.
> > There
> > >> >> >> was work done to compare thrift vs avro vs protobuf.
The
> > conclusion
> > >> >> >> was protobuf was the decision to use.
> > >> >> >>
> > >> >> >> Prior to this move, there had been a lot of noise about
> pluggable
> > RPC
> > >> >> >> transports, and whatnot. It held up adoption of a backwards
> > >> compatible
> > >> >> >> serialization framework for a long time. The problem
ended up
> > being
> > >> >> >> the analysis-paralysis, rather than the specific implementation
> > >> >> >> problem. In other words, the problem was a LACK of
> implementation
> > >> than
> > >> >> >> actual REAL problems.
> > >> >> >>
> > >> >> >> Based on this experience, I'd strongly suggest adopting
protobuf
> > and
> > >> >> >> moving on. Forget about pluggable RPC implementations,
the
> > complexity
> > >> >> >> doesnt deliver benefits. The benefits of protobuf is
that its
> the
> > RPC
> > >> >> >> format for Hadoop and HBase, which allows Drill to draw
on the
> > broad
> > >> >> >> experience of those communities who need to implement
high
> > >> performance
> > >> >> >> backwards compatible RPC serialization.
> > >> >> >>
> > >> >> >> ====
> > >> >> >>
> > >> >> >> Expanding a bit, I've looked in to this issue a lot,
and there
> is
> > >> very
> > >> >> >> few significant concrete reasons to choose protobuf vs
thrift.
> >  Tiny
> > >> >> >> percent faster of this, and that, etc.  I'd strongly
suggest
> > protobuf
> > >> >> >> for the expanded community.  There is no particular Apache
> > imperative
> > >> >> >> that Apache projects re-use libraries.  Use what makes
sense for
> > your
> > >> >> >> project.
> > >> >> >>
> > >> >> >> As regards to Avro, it's a fine serialization format
for long
> term
> > >> >> >> data retention, but the complexities that exist to enable
that
> > make
> > >> it
> > >> >> >> non-ideal for an RPC.  I know of no one who uses AvroRPC
in any
> > form.
> > >> >> >>
> > >> >> >> -ryan
> > >> >> >>
> > >> >> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <
> > tshiran@maprtech.com>
> > >> >> >> wrote:
> > >> >> >> > We plan to propose the architecture and interfaces
in the next
> > >> couple
> > >> >> >> > weeks, which will make it easy to divide the project
into
> clear
> > >> >> building
> > >> >> >> > blocks. At that point it will be easier to start
contributing
> > >> >> different
> > >> >> >> > data sources, data formats, operators, query languages,
etc.
> > >> >> >> >
> > >> >> >> > The contributions are done in the usual Apache way.
It's best
> to
> > >> open
> > >> >> a
> > >> >> >> > JIRA and then post a patch so that others can review
and then
> a
> > >> >> committer
> > >> >> >> > can check it in.
> > >> >> >> >
> > >> >> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia
<
> > >> >> >> chandanmadhesia@gmail.com
> > >> >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Hi
> > >> >> >> >>
> > >> >> >> >> Hi
> > >> >> >> >>
> > >> >> >> >> What is the process to become a contributor
to drill ?
> > >> >> >> >>
> > >> >> >> >> Regards
> > >> >> >> >> chandan
> > >> >> >> >>
> > >> >> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning
<
> > >> ted.dunning@gmail.com>
> > >> >> >> wrote:
> > >> >> >> >>
> > >> >> >> >> > Suffice it to say that if *you* think it
is important
> enough
> > to
> > >> >> >> implement
> > >> >> >> >> > and maintain, then the group shouldn't
say naye.  The
> > consensus
> > >> >> stuff
> > >> >> >> >> > should only block things that break something
else.
>  Additive
> > >> >> features
> > >> >> >> >> that
> > >> >> >> >> > are highly maintainable (or which come
with commitments)
> > >> shouldn't
> > >> >> >> >> > generally be blocked.
> > >> >> >> >> >
> > >> >> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael
Hausenblas <
> > >> >> >> >> > michael.hausenblas@gmail.com> wrote:
> > >> >> >> >> >
> > >> >> >> >> > > Good. Feel free to put me down for
that, if the group as
> a
> > >> whole
> > >> >> >> thinks
> > >> >> >> >> > > that (supporting Thrift) makes sense.
> > >> >> >> >> > >
> > >> >> >> >> >
> > >> >> >> >>
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > --
> > >> >> >> > Tomer Shiran
> > >> >> >> > Director of Product Management | MapR Technologies
|
> > 650-804-8657
> > >> >> >>
> > >> >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message