drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jhyde.apa...@gmail.com>
Subject Re: [DISCUSS] Get off Calcite Forked Version
Date Tue, 17 Nov 2015 17:48:52 GMT
I can do a hackathon if it helps. You're approaching it as a "big bang" but we could approach
it as a burn down, counting the number of commits different between drill's fork and master.
 There's a lot of incremental value each commit we remove. 

Julian

> On Nov 17, 2015, at 08:58, Jacques Nadeau <jacques@dremio.com> wrote:
> 
> Hey Jinfeng,
> 
> Thanks for pulling this together. It is definitely time to get off the
> fork. I see a number of items with PCM. Should we do a hackathon to do all
> the PCMs one afternoon? Maybe we could convince Julian to join so we could
> make short progress of the issues. The original commits that have PCM were
> done by the following developers: Mehant, Jinfeng, Jacques, Venki & Aman.
> Even if we just get Aman, Jinfeng and myself, we could probably close out
> most of the gaps. (Mehant only has one and it seems pretty small. Venki's
> is larger but is only one as well.)
> 
> It seems like an achievable goal that we could get could get on top of
> 1.5.x master + star + 2-3 other patches for the 1.4 release. What are your
> thoughts?
> 
> 
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
>> On Tue, Nov 10, 2015 at 2:55 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>> 
>> Julian,
>> 
>> Thanks a lot for your detailed comments!
>> 
>> I totally agree with you on "calcite first" policy. That's why on MapR
>> side, I recommend people get changes in Calcite master first, then
>> cherry-pick back. Several recent patches follow this patten, as we
>> realized the pain to maintain this calcite fork.
>> 
>> I also agree that we should get them in ASAP and it is a good idea to
>> start with simple patch firsts.
>> 
>> 
>> 1. Why do you need "Add new method to ViewExpander interface to allow
>> passing SchemaRoot."? Most ViewExpander implementations have a root
>> schema. Note that the return type of that method has changed from
>> RelNode to RelRoot.
>> 
>> [
>> Venki might have better idea. As far as I remember, the patch was
>> merged to handle case where the view is created in a different schema
>> from the source schema. Drill uses Frameworks, not the regular code
>> path that Calcite master uses. Not sure if that's the cause of the
>> problem on Drill side.
>> ]
>> 
>> 2. "Match Logical Rel in ReduceExpressionRule." should be easy now we've
>> parameterized the rules, and maybe you can get rid of "Enable
>> inheriting from previously singleton filter and calc constant
>> reduction rules.".
>> 
>> [ Agree.]
>> 
>> 3. "Disable the nullability cast checking in Filter." is worth discussing
>> on the Calcite list.
>> 
>> [ Will do that once I revise the patch with testcase.]
>> 
>> 4. In "Return validated row type and validated SqlNode when call
>> Planner.validate()." you should return a Pair<SqlNode, RelDataType>.
>> 
>> [ The patch essentially returns Pair<SqlNode, RelDataType>, since it
>> put SqlNode and RelDataType in a wrapper. With Calcite master, seems
>> RelRoot probably is a better choice; it includes everything that we
>> want.
>> ]
>> 
>> 
>> 5. To do "Use case-insensitive comparison when creating output row type
>> of a JoinRel." properly we require a new config parameter saying
>> whether internal field names are case-sensitive.
>> 
>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until
>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute
>> a fix for CALCITE-777! :)
>> 
>> [Aman will have better idea]
>> 
>> 6. Why do you need "Make public the constructor of
>> SqlSumEmptyIsZeroAggFunction,"?
>> 
>> [ It's used in Dril's ReduceAggregateRule. We should revisit this rule
>> to see whether it's still required to keep this rule in stead of using
>> Calcite's ReduceAggregateRule directly.
>> ]
>> 
>> 7. You don't need "Make certain push project constructors protected such
>> that derived classes can access them." now we've parameterized
>> ProjectFilterTransposeRule.
>> 
>> [Agree.]
>> 
>> 8. "DRILL-1455: Add return type-inference strategy for arithmetic
>> operators when one of the arguments is ANY type." needs some
>> discussion. What if both are ANY? How are other implementations of the
>> operators (e.g. Calcite, Phoenix) going to implement the operator now
>> that Drill's lax validation policy has allowed it in?
>> 
>> [  Will discuss in Calcite list once we have the PR ready for this patch.
>> ]
>> 
>> 9. I can't see yet how the [StarColumn] changes could make it in. Will
>> require some discussion. Note that CALCITE-546 has broken some of your
>> assumptions.
>> 
>> [ I know it would be big headache to resolve the changes related to
>> star column for schema-on-read table.  I have to re-visit those
>> changes, and probably start a discussion on Calcite list once I do a
>> rebase and code cleanup
>> ]
>> 
>> Again, thanks for your feedback. I feel it's the right time for the
>> Drill community to try to get off calcite fork!
>> 
>> Regards,
>> 
>> Jinfeng
>> 
>> 
>>> On Tue, Nov 10, 2015 at 1:51 PM, Julian Hyde <jhyde@apache.org> wrote:
>>> Going forward, I think you should start a "calcite first" policy --
>>> get changes accepted into Calcite, with tests of course, then
>>> cherry-pick into Drill's branch. Recent changes like "Allow a MAP
>>> literal type." should have done that. And especially changes where you
>>> could just parameterize planner rule. In other words, stop digging the
>>> hole deeper.
>>> 
>>> I think you should identify and do the easy ones ASAP.
>>> 
>>> Rather than creating a branch for each of these contributions, if you
>>> prefer you could create a pull request of the whole branch I think
>>> that individual JIRAs can just reference a commit id that a Calcite
>>> committer can cherry-pick.
>>> 
>>> Now some remarks on specific changes.
>>> 
>>> Why do you need "Add new method to ViewExpander interface to allow
>>> passing SchemaRoot."? Most ViewExpander implementations have a root
>>> schema. Note that the return type of that method has changed from
>>> RelNode to RelRoot.
>>> 
>>> "Match Logical Rel in ReduceExpressionRule." should be easy now we've
>>> parameterized the rules, and maybe you can get rid of "Enable
>>> inheriting from previously singleton filter and calc constant
>>> reduction rules.".
>>> 
>>> "Disable the nullability cast checking in Filter." is worth discussing
>>> on the Calcite list.
>>> 
>>> In "Return validated row type and validated SqlNode when call
>>> Planner.validate()." you should return a Pair<SqlNode, RelDataType>.
>>> 
>>> To do "Use case-insensitive comparison when creating output row type
>>> of a JoinRel." properly we require a new config parameter saying
>>> whether internal field names are case-sensitive.
>>> 
>>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until
>>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute
>>> a fix for CALCITE-777! :)
>>> 
>>> Why do you need "Make public the constructor of
>> SqlSumEmptyIsZeroAggFunction,"?
>>> 
>>> You don't need "Make certain push project constructors protected such
>>> that derived classes can access them." now we've parameterized
>>> ProjectFilterTransposeRule.
>>> 
>>> "DRILL-1455: Add return type-inference strategy for arithmetic
>>> operators when one of the arguments is ANY type." needs some
>>> discussion. What if both are ANY? How are other implementations of the
>>> operators (e.g. Calcite, Phoenix) going to implement the operator now
>>> that Drill's lax validation policy has allowed it in?
>>> 
>>> I can't see yet how the [StarColumn] changes could make it in. Will
>>> require some discussion. Note that CALCITE-546 has broken some of your
>>> assumptions.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>>> On Tue, Nov 10, 2015 at 12:27 PM, Jinfeng Ni <jni@apache.org> wrote:
>>>> Currently, Drill maintains a forked version of Calcite. Because of
>>>> this, it has been a big headache to rebase onto the latest Calcite
>>>> master branch every time when Calcite makes a new release.
>>>> 
>>>> During today's Drill hangout, we discuss ideas around how to get Drill
>>>> off the forked Calcite, by either pushing back patches in the forked
>>>> version to Calcite master branch, or trying to see if those patches
>>>> are still required for Drill to work properly.
>>>> 
>>>> There are 49 commits in forked Calcite whose base is Calcite 1.4.0
>>>> release [1]. I put a spreadsheet for the current status of those 49
>>>> commits, as far as I know [2]. Half of them are the patches that were
>>>> cherry-picked from Calcite master branch,  9 are just for version
>>>> change, and the rest requires either push back or re-work in Drill
>>>> side.
>>>> 
>>>> I think we probably have to spend considerable effort to get this done
>>>> ASAP. Otherwise, it will continue to be an issue for Drill.
>>>> 
>>>> There are two separate work to be done:
>>>> 1. Continue to rebase forked version onto latest Calcite master
>>>> (Currently it's 1.5.0)
>>>> 2. Push the patches in fork to Calcite master. Once all patches are
>>>> in, we could finally get rid of fork and rebasing efforts.
>>>> 
>>>> The spreadsheet has a column for sign-up.
>>>> 
>>>> Thoughts?
>>>> 
>>>> (let me know if you have problem accessing the spreadsheet).
>>>> 
>>>> 
>>>> [1] https://github.com/dremio/calcite/tree/DrillCalcite1.4.0
>>>> [2]
>> https://docs.google.com/spreadsheets/d/1Z0J5aMCBfqq_9SEl-LXsi_B8Ahaosygqke4cH6pPwmo/edit#gid=0
>> 

Mime
View raw message