Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 203DF18DEF for ; Tue, 17 Nov 2015 18:46:07 +0000 (UTC) Received: (qmail 43929 invoked by uid 500); 17 Nov 2015 18:46:06 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 43879 invoked by uid 500); 17 Nov 2015 18:46:06 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 43867 invoked by uid 99); 17 Nov 2015 18:46:06 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2015 18:46:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 316B5C5CAE for ; Tue, 17 Nov 2015 18:46:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.15 X-Spam-Level: X-Spam-Status: No, score=0.15 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id kk5rUhWn_Fhx for ; Tue, 17 Nov 2015 18:45:56 +0000 (UTC) Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 2E83420271 for ; Tue, 17 Nov 2015 18:45:56 +0000 (UTC) Received: by wmvv187 with SMTP id v187so242650952wmv.1 for ; Tue, 17 Nov 2015 10:45:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=xnKHGM5JrIpdo00XTPbuaNYXPt32MeKy/kGSJinkasU=; b=ke6QbH/1VniCRklxPq0UG8VM91rnXube/D8U8CyOxYNFOpBk2wQBLfBiemyk+KM3aP F2ZgbnE9fJQYg2+IitD7obw3CGeRz3QYE7n7elL1T625+n+vdbeHse2E2Kz4NVM4K2q/ 0cdmaakZwFK04v4kZvKmW52NQ79f1p4mZQTHPdprXPloByWvQ2qDVS08hOrQygPdmmm8 MQ3vsoerhdt/3OubVJIB29vEcVxDW55X/kputT16+7hb7tpDoRf7HpT2kaYkAAej4K1S yhWBOhU6QYlq/l7yNO8FfwGLmK4waXlyjCIl5a5womFe2cMI6A8UlRQkhJ+UpMHP/F0j wZ2Q== MIME-Version: 1.0 X-Received: by 10.194.91.234 with SMTP id ch10mr52140530wjb.69.1447785955832; Tue, 17 Nov 2015 10:45:55 -0800 (PST) Received: by 10.28.100.133 with HTTP; Tue, 17 Nov 2015 10:45:55 -0800 (PST) In-Reply-To: <50606A7C-5EAD-45F2-BDAF-6FE2F90E18BE@gmail.com> References: <50606A7C-5EAD-45F2-BDAF-6FE2F90E18BE@gmail.com> Date: Tue, 17 Nov 2015 10:45:55 -0800 Message-ID: Subject: Re: [DISCUSS] Get off Calcite Forked Version From: Jinfeng Ni To: dev@drill.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I will log Calcite JIRA for each easy commit in the fork. Thanks! On Tue, Nov 17, 2015 at 10:44 AM, Julian Hyde wrot= e: > I hear you. > > However, get Jinfeng to log a calcite jira for each "easy" commit and we = can start burning through them. As I said earlier a commit Id to cherry pic= k might be sufficient. That review process involves work on my part (or on = the part of other calcite committers reviewing) so we would be contributing= to the effort. (Yeah I know it doesn't always feel that a reviewer is doin= g useful work.) > > Julian > >> On Nov 17, 2015, at 09:54, Jacques Nadeau wrote: >> >> I'm fine with any approach. My suggestion came from my sense that, so fa= r, >> our burn down hasn't been super successful. And sometimes I feel like >> Jinfeng gets stuck doing all the heavy lifting. >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >>> On Tue, Nov 17, 2015 at 9:48 AM, Julian Hyde w= rote: >>> >>> I can do a hackathon if it helps. You're approaching it as a "big bang" >>> but we could approach it as a burn down, counting the number of commits >>> different between drill's fork and master. There's a lot of incrementa= l >>> value each commit we remove. >>> >>> Julian >>> >>>> On Nov 17, 2015, at 08:58, Jacques Nadeau wrote: >>>> >>>> Hey Jinfeng, >>>> >>>> Thanks for pulling this together. It is definitely time to get off the >>>> fork. I see a number of items with PCM. Should we do a hackathon to do >>> all >>>> the PCMs one afternoon? Maybe we could convince Julian to join so we >>> could >>>> make short progress of the issues. The original commits that have PCM >>> were >>>> done by the following developers: Mehant, Jinfeng, Jacques, Venki & Am= an. >>>> Even if we just get Aman, Jinfeng and myself, we could probably close = out >>>> most of the gaps. (Mehant only has one and it seems pretty small. Venk= i's >>>> is larger but is only one as well.) >>>> >>>> It seems like an achievable goal that we could get could get on top of >>>> 1.5.x master + star + 2-3 other patches for the 1.4 release. What are >>> your >>>> thoughts? >>>> >>>> >>>> >>>> -- >>>> Jacques Nadeau >>>> CTO and Co-Founder, Dremio >>>> >>>>> On Tue, Nov 10, 2015 at 2:55 PM, Jinfeng Ni >>> wrote: >>>>> >>>>> Julian, >>>>> >>>>> Thanks a lot for your detailed comments! >>>>> >>>>> I totally agree with you on "calcite first" policy. That's why on Map= R >>>>> side, I recommend people get changes in Calcite master first, then >>>>> cherry-pick back. Several recent patches follow this patten, as we >>>>> realized the pain to maintain this calcite fork. >>>>> >>>>> I also agree that we should get them in ASAP and it is a good idea to >>>>> start with simple patch firsts. >>>>> >>>>> >>>>> 1. Why do you need "Add new method to ViewExpander interface to allow >>>>> passing SchemaRoot."? Most ViewExpander implementations have a root >>>>> schema. Note that the return type of that method has changed from >>>>> RelNode to RelRoot. >>>>> >>>>> [ >>>>> Venki might have better idea. As far as I remember, the patch was >>>>> merged to handle case where the view is created in a different schema >>>>> from the source schema. Drill uses Frameworks, not the regular code >>>>> path that Calcite master uses. Not sure if that's the cause of the >>>>> problem on Drill side. >>>>> ] >>>>> >>>>> 2. "Match Logical Rel in ReduceExpressionRule." should be easy now we= 've >>>>> parameterized the rules, and maybe you can get rid of "Enable >>>>> inheriting from previously singleton filter and calc constant >>>>> reduction rules.". >>>>> >>>>> [ Agree.] >>>>> >>>>> 3. "Disable the nullability cast checking in Filter." is worth >>> discussing >>>>> on the Calcite list. >>>>> >>>>> [ Will do that once I revise the patch with testcase.] >>>>> >>>>> 4. In "Return validated row type and validated SqlNode when call >>>>> Planner.validate()." you should return a Pair. >>>>> >>>>> [ The patch essentially returns Pair, since it >>>>> put SqlNode and RelDataType in a wrapper. With Calcite master, seems >>>>> RelRoot probably is a better choice; it includes everything that we >>>>> want. >>>>> ] >>>>> >>>>> >>>>> 5. To do "Use case-insensitive comparison when creating output row ty= pe >>>>> of a JoinRel." properly we require a new config parameter saying >>>>> whether internal field names are case-sensitive. >>>>> >>>>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until >>>>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribut= e >>>>> a fix for CALCITE-777! :) >>>>> >>>>> [Aman will have better idea] >>>>> >>>>> 6. Why do you need "Make public the constructor of >>>>> SqlSumEmptyIsZeroAggFunction,"? >>>>> >>>>> [ It's used in Dril's ReduceAggregateRule. We should revisit this rul= e >>>>> to see whether it's still required to keep this rule in stead of usin= g >>>>> Calcite's ReduceAggregateRule directly. >>>>> ] >>>>> >>>>> 7. You don't need "Make certain push project constructors protected s= uch >>>>> that derived classes can access them." now we've parameterized >>>>> ProjectFilterTransposeRule. >>>>> >>>>> [Agree.] >>>>> >>>>> 8. "DRILL-1455: Add return type-inference strategy for arithmetic >>>>> operators when one of the arguments is ANY type." needs some >>>>> discussion. What if both are ANY? How are other implementations of th= e >>>>> operators (e.g. Calcite, Phoenix) going to implement the operator now >>>>> that Drill's lax validation policy has allowed it in? >>>>> >>>>> [ Will discuss in Calcite list once we have the PR ready for this >>> patch. >>>>> ] >>>>> >>>>> 9. I can't see yet how the [StarColumn] changes could make it in. Wil= l >>>>> require some discussion. Note that CALCITE-546 has broken some of you= r >>>>> assumptions. >>>>> >>>>> [ I know it would be big headache to resolve the changes related to >>>>> star column for schema-on-read table. I have to re-visit those >>>>> changes, and probably start a discussion on Calcite list once I do a >>>>> rebase and code cleanup >>>>> ] >>>>> >>>>> Again, thanks for your feedback. I feel it's the right time for the >>>>> Drill community to try to get off calcite fork! >>>>> >>>>> Regards, >>>>> >>>>> Jinfeng >>>>> >>>>> >>>>>> On Tue, Nov 10, 2015 at 1:51 PM, Julian Hyde wrot= e: >>>>>> Going forward, I think you should start a "calcite first" policy -- >>>>>> get changes accepted into Calcite, with tests of course, then >>>>>> cherry-pick into Drill's branch. Recent changes like "Allow a MAP >>>>>> literal type." should have done that. And especially changes where y= ou >>>>>> could just parameterize planner rule. In other words, stop digging t= he >>>>>> hole deeper. >>>>>> >>>>>> I think you should identify and do the easy ones ASAP. >>>>>> >>>>>> Rather than creating a branch for each of these contributions, if yo= u >>>>>> prefer you could create a pull request of the whole branch I think >>>>>> that individual JIRAs can just reference a commit id that a Calcite >>>>>> committer can cherry-pick. >>>>>> >>>>>> Now some remarks on specific changes. >>>>>> >>>>>> Why do you need "Add new method to ViewExpander interface to allow >>>>>> passing SchemaRoot."? Most ViewExpander implementations have a root >>>>>> schema. Note that the return type of that method has changed from >>>>>> RelNode to RelRoot. >>>>>> >>>>>> "Match Logical Rel in ReduceExpressionRule." should be easy now we'v= e >>>>>> parameterized the rules, and maybe you can get rid of "Enable >>>>>> inheriting from previously singleton filter and calc constant >>>>>> reduction rules.". >>>>>> >>>>>> "Disable the nullability cast checking in Filter." is worth discussi= ng >>>>>> on the Calcite list. >>>>>> >>>>>> In "Return validated row type and validated SqlNode when call >>>>>> Planner.validate()." you should return a Pair. >>>>>> >>>>>> To do "Use case-insensitive comparison when creating output row type >>>>>> of a JoinRel." properly we require a new config parameter saying >>>>>> whether internal field names are case-sensitive. >>>>>> >>>>>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until >>>>>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribu= te >>>>>> a fix for CALCITE-777! :) >>>>>> >>>>>> Why do you need "Make public the constructor of >>>>> SqlSumEmptyIsZeroAggFunction,"? >>>>>> >>>>>> You don't need "Make certain push project constructors protected suc= h >>>>>> that derived classes can access them." now we've parameterized >>>>>> ProjectFilterTransposeRule. >>>>>> >>>>>> "DRILL-1455: Add return type-inference strategy for arithmetic >>>>>> operators when one of the arguments is ANY type." needs some >>>>>> discussion. What if both are ANY? How are other implementations of t= he >>>>>> operators (e.g. Calcite, Phoenix) going to implement the operator no= w >>>>>> that Drill's lax validation policy has allowed it in? >>>>>> >>>>>> I can't see yet how the [StarColumn] changes could make it in. Will >>>>>> require some discussion. Note that CALCITE-546 has broken some of yo= ur >>>>>> assumptions. >>>>>> >>>>>> Julian >>>>>> >>>>>> >>>>>> >>>>>>> On Tue, Nov 10, 2015 at 12:27 PM, Jinfeng Ni wrote= : >>>>>>> Currently, Drill maintains a forked version of Calcite. Because of >>>>>>> this, it has been a big headache to rebase onto the latest Calcite >>>>>>> master branch every time when Calcite makes a new release. >>>>>>> >>>>>>> During today's Drill hangout, we discuss ideas around how to get Dr= ill >>>>>>> off the forked Calcite, by either pushing back patches in the forke= d >>>>>>> version to Calcite master branch, or trying to see if those patches >>>>>>> are still required for Drill to work properly. >>>>>>> >>>>>>> There are 49 commits in forked Calcite whose base is Calcite 1.4.0 >>>>>>> release [1]. I put a spreadsheet for the current status of those 49 >>>>>>> commits, as far as I know [2]. Half of them are the patches that we= re >>>>>>> cherry-picked from Calcite master branch, 9 are just for version >>>>>>> change, and the rest requires either push back or re-work in Drill >>>>>>> side. >>>>>>> >>>>>>> I think we probably have to spend considerable effort to get this d= one >>>>>>> ASAP. Otherwise, it will continue to be an issue for Drill. >>>>>>> >>>>>>> There are two separate work to be done: >>>>>>> 1. Continue to rebase forked version onto latest Calcite master >>>>>>> (Currently it's 1.5.0) >>>>>>> 2. Push the patches in fork to Calcite master. Once all patches are >>>>>>> in, we could finally get rid of fork and rebasing efforts. >>>>>>> >>>>>>> The spreadsheet has a column for sign-up. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> (let me know if you have problem accessing the spreadsheet). >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/dremio/calcite/tree/DrillCalcite1.4.0 >>>>>>> [2] >>>>> >>> https://docs.google.com/spreadsheets/d/1Z0J5aMCBfqq_9SEl-LXsi_B8Ahaosyg= qke4cH6pPwmo/edit#gid=3D0 >>>>> >>>