Return-Path: X-Original-To: apmail-incubator-drill-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A093D6FA for ; Fri, 14 Sep 2012 22:24:29 +0000 (UTC) Received: (qmail 29730 invoked by uid 500); 14 Sep 2012 22:24:29 -0000 Delivered-To: apmail-incubator-drill-dev-archive@incubator.apache.org Received: (qmail 29705 invoked by uid 500); 14 Sep 2012 22:24:29 -0000 Mailing-List: contact drill-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-dev@incubator.apache.org Delivered-To: mailing list drill-dev@incubator.apache.org Received: (qmail 29696 invoked by uid 99); 14 Sep 2012 22:24:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 22:24:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of pconstantine@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Sep 2012 22:24:22 +0000 Received: by wgbdq11 with SMTP id dq11so831215wgb.0 for ; Fri, 14 Sep 2012 15:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=8eD7JdFouvV4PYt/dH4a7Pc/EGnh9ctVAF0qSX/+gKo=; b=k8SGui6jUfRD5lQ5BEyVWsOlLukh/T/JPVU7/d6o5s1wJOHiDa4ZdAxT29mLYdqhac RDl5OxKOexVhcHwuTDVqieWcGh+YGA+O90f8PONfsa9bkqKVhA94KX1qF22A3jsdr3sR +5O2dpCrJqBOajPlm7+ibbh6Sii61G38l1pCmRmphbS+oTO03ojcBrrS/vspLJ9mGnZp XnEqbisU/lrugV7g/bJUE9XIIcoLEMj3vjxDS8POzTvb1ybtOxPF4DAmPHZGiSOYVRqN 31Jv92OhhPkIV9p/YMVEIlX0aB+sbXbWfWrzX6c0i+YfXA4hz0agrm3WBer87hvDHM0W NIcg== MIME-Version: 1.0 Received: by 10.216.66.7 with SMTP id g7mr2316201wed.146.1347661441729; Fri, 14 Sep 2012 15:24:01 -0700 (PDT) Sender: pconstantine@gmail.com Received: by 10.180.97.230 with HTTP; Fri, 14 Sep 2012 15:24:01 -0700 (PDT) In-Reply-To: References: <4C785CAB-FD0E-4C5A-8D83-7AD0B7752139@gmail.com> Date: Sat, 15 Sep 2012 01:24:01 +0300 X-Google-Sender-Auth: mUFgJWcneGkyZUDb5dAExTrhG20 Message-ID: Subject: Re: Thrift? From: Constantine Peresypkin To: drill-dev@incubator.apache.org Content-Type: multipart/alternative; boundary=e0cb4e7004ed83060d04c9b0e15d --e0cb4e7004ed83060d04c9b0e15d Content-Type: text/plain; charset=ISO-8859-1 Protobuf is an attempt to make ASN.1 more developer friendly (not a bad attempt). It's simpler, has much less features, easier to implement and has a compact encoding. But on other hand it's non-standard, "reinvented wheel" they could just do a "better than PER" encoding for ASN.1, and AFAIK has no support for the new and shiny Google encodings, like "group varint". All in all in current situation it seems a better choice than ASN.1, not even arguing about something even more vague and non-standard as Thrift. On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson wrote: > Thanks for that Ted. > > Correct - internal wire format doesnt mean 'drill only supports > protobuf encoded data'. > > Part of the reason to favor protobuf is that a lot of people in the > broader 'big data' community are building a lot of experience with it. > Hadoop and HBase both are moving to/moved to protobuf on the wire. > Being able to leverage this expertise is valuable. > > There is a JIRA in Hadoop-land where someone had done a deep dive > 'bake off' between thrift, protobuf and avro. The ultimate choice was > protobuf for a number of reasons. If people want to re-do the > analysis, I'd like to see it in the context of THAT analysis (eg: why > the assumptions there are not the same for Drill)... if anything it'd > give a concrete form to what can be a mire. > > For what it's worth, I've had many discussion along these angles with > a variety of people including committers on Thrift, and the consensus > is both are good choices. > > -ryan > > On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning > wrote: > > I think that it is important to ask a few questions leading up a decision > > here. > > > > The first is a (rhetorical) show of hands about how many people believe > > that there are no serious performance or expressivity killers when > > comparing alternative serialization frameworks. As far as I know, > > performance differences are not massive (and protobufs is one of the > > leaders in any case) and the expressivity differences are essentially > nil. > > If somebody feels that there is a serious show-stopper with any option, > > they should speak. > > > > The second is to ask the sense of the community whether they judge > progress > > or perfection in this decision is most important to the project. My > guess > > is that almost everybody would prefer to see progress as long as the > > technical choice is not subject to some horrid missing bit. > > > > The final question is whether it is reasonable to go along with protobufs > > given that several very experienced engineers prefer it and would like to > > produce code based on it. If the first two answers are answered to the > > effect of protobufs is about as good as we will find and that progress > > trumps small differences, then it seems that moving to follow this > > preference of Jason and Ryan for protobufs might be a reasonable thing to > > do. > > > > The question of an internal wire format, btw, does not constrain the > > project relative to external access. I think it is important to support > > JDBC and ODBC and whatever is in common use for querying. For external > > access the question is quite different. Whereas for the internal format > > consensus around a single choice has large benefits, the external format > > choice is nearly the opposite. For an external format, limiting > ourselves > > to a single choice seems like a bad idea and increasing the audience > seems > > like a better choice. > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson > wrote: > > > >> Hi folks, > >> > >> I just commented on this first JIRA. Here is my text: > >> > >> This issue has been hashed over a lot in the Hadoop projects. There > >> was work done to compare thrift vs avro vs protobuf. The conclusion > >> was protobuf was the decision to use. > >> > >> Prior to this move, there had been a lot of noise about pluggable RPC > >> transports, and whatnot. It held up adoption of a backwards compatible > >> serialization framework for a long time. The problem ended up being > >> the analysis-paralysis, rather than the specific implementation > >> problem. In other words, the problem was a LACK of implementation than > >> actual REAL problems. > >> > >> Based on this experience, I'd strongly suggest adopting protobuf and > >> moving on. Forget about pluggable RPC implementations, the complexity > >> doesnt deliver benefits. The benefits of protobuf is that its the RPC > >> format for Hadoop and HBase, which allows Drill to draw on the broad > >> experience of those communities who need to implement high performance > >> backwards compatible RPC serialization. > >> > >> ==== > >> > >> Expanding a bit, I've looked in to this issue a lot, and there is very > >> few significant concrete reasons to choose protobuf vs thrift. Tiny > >> percent faster of this, and that, etc. I'd strongly suggest protobuf > >> for the expanded community. There is no particular Apache imperative > >> that Apache projects re-use libraries. Use what makes sense for your > >> project. > >> > >> As regards to Avro, it's a fine serialization format for long term > >> data retention, but the complexities that exist to enable that make it > >> non-ideal for an RPC. I know of no one who uses AvroRPC in any form. > >> > >> -ryan > >> > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran > >> wrote: > >> > We plan to propose the architecture and interfaces in the next couple > >> > weeks, which will make it easy to divide the project into clear > building > >> > blocks. At that point it will be easier to start contributing > different > >> > data sources, data formats, operators, query languages, etc. > >> > > >> > The contributions are done in the usual Apache way. It's best to open > a > >> > JIRA and then post a patch so that others can review and then a > committer > >> > can check it in. > >> > > >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > >> chandanmadhesia@gmail.com > >> >> wrote: > >> > > >> >> Hi > >> >> > >> >> Hi > >> >> > >> >> What is the process to become a contributor to drill ? > >> >> > >> >> Regards > >> >> chandan > >> >> > >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning > >> wrote: > >> >> > >> >> > Suffice it to say that if *you* think it is important enough to > >> implement > >> >> > and maintain, then the group shouldn't say naye. The consensus > stuff > >> >> > should only block things that break something else. Additive > features > >> >> that > >> >> > are highly maintainable (or which come with commitments) shouldn't > >> >> > generally be blocked. > >> >> > > >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > >> >> > michael.hausenblas@gmail.com> wrote: > >> >> > > >> >> > > Good. Feel free to put me down for that, if the group as a whole > >> thinks > >> >> > > that (supporting Thrift) makes sense. > >> >> > > > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Tomer Shiran > >> > Director of Product Management | MapR Technologies | 650-804-8657 > >> > --e0cb4e7004ed83060d04c9b0e15d--