drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: [DISCUSS] Get off Calcite Forked Version
Date Tue, 17 Nov 2015 18:45:55 GMT
I will log Calcite JIRA for each easy commit in the fork. Thanks!



On Tue, Nov 17, 2015 at 10:44 AM, Julian Hyde <jhyde.apache@gmail.com> wrote:
> I hear you.
>
> However, get Jinfeng to log a calcite jira for each "easy" commit and we can start burning
through them. As I said earlier a commit Id to cherry pick might be sufficient. That review
process involves work on my part (or on the part of other calcite committers reviewing) so
we would be contributing to the effort. (Yeah I know it doesn't always feel that a reviewer
is doing useful work.)
>
> Julian
>
>> On Nov 17, 2015, at 09:54, Jacques Nadeau <jacques@dremio.com> wrote:
>>
>> I'm fine with any approach. My suggestion came from my sense that, so far,
>> our burn down hasn't been super successful. And sometimes I feel like
>> Jinfeng gets stuck doing all the heavy lifting.
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>>> On Tue, Nov 17, 2015 at 9:48 AM, Julian Hyde <jhyde.apache@gmail.com> wrote:
>>>
>>> I can do a hackathon if it helps. You're approaching it as a "big bang"
>>> but we could approach it as a burn down, counting the number of commits
>>> different between drill's fork and master.  There's a lot of incremental
>>> value each commit we remove.
>>>
>>> Julian
>>>
>>>> On Nov 17, 2015, at 08:58, Jacques Nadeau <jacques@dremio.com> wrote:
>>>>
>>>> Hey Jinfeng,
>>>>
>>>> Thanks for pulling this together. It is definitely time to get off the
>>>> fork. I see a number of items with PCM. Should we do a hackathon to do
>>> all
>>>> the PCMs one afternoon? Maybe we could convince Julian to join so we
>>> could
>>>> make short progress of the issues. The original commits that have PCM
>>> were
>>>> done by the following developers: Mehant, Jinfeng, Jacques, Venki & Aman.
>>>> Even if we just get Aman, Jinfeng and myself, we could probably close out
>>>> most of the gaps. (Mehant only has one and it seems pretty small. Venki's
>>>> is larger but is only one as well.)
>>>>
>>>> It seems like an achievable goal that we could get could get on top of
>>>> 1.5.x master + star + 2-3 other patches for the 1.4 release. What are
>>> your
>>>> thoughts?
>>>>
>>>>
>>>>
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>>
>>>>> On Tue, Nov 10, 2015 at 2:55 PM, Jinfeng Ni <jinfengni99@gmail.com>
>>> wrote:
>>>>>
>>>>> Julian,
>>>>>
>>>>> Thanks a lot for your detailed comments!
>>>>>
>>>>> I totally agree with you on "calcite first" policy. That's why on MapR
>>>>> side, I recommend people get changes in Calcite master first, then
>>>>> cherry-pick back. Several recent patches follow this patten, as we
>>>>> realized the pain to maintain this calcite fork.
>>>>>
>>>>> I also agree that we should get them in ASAP and it is a good idea to
>>>>> start with simple patch firsts.
>>>>>
>>>>>
>>>>> 1. Why do you need "Add new method to ViewExpander interface to allow
>>>>> passing SchemaRoot."? Most ViewExpander implementations have a root
>>>>> schema. Note that the return type of that method has changed from
>>>>> RelNode to RelRoot.
>>>>>
>>>>> [
>>>>> Venki might have better idea. As far as I remember, the patch was
>>>>> merged to handle case where the view is created in a different schema
>>>>> from the source schema. Drill uses Frameworks, not the regular code
>>>>> path that Calcite master uses. Not sure if that's the cause of the
>>>>> problem on Drill side.
>>>>> ]
>>>>>
>>>>> 2. "Match Logical Rel in ReduceExpressionRule." should be easy now we've
>>>>> parameterized the rules, and maybe you can get rid of "Enable
>>>>> inheriting from previously singleton filter and calc constant
>>>>> reduction rules.".
>>>>>
>>>>> [ Agree.]
>>>>>
>>>>> 3. "Disable the nullability cast checking in Filter." is worth
>>> discussing
>>>>> on the Calcite list.
>>>>>
>>>>> [ Will do that once I revise the patch with testcase.]
>>>>>
>>>>> 4. In "Return validated row type and validated SqlNode when call
>>>>> Planner.validate()." you should return a Pair<SqlNode, RelDataType>.
>>>>>
>>>>> [ The patch essentially returns Pair<SqlNode, RelDataType>, since
it
>>>>> put SqlNode and RelDataType in a wrapper. With Calcite master, seems
>>>>> RelRoot probably is a better choice; it includes everything that we
>>>>> want.
>>>>> ]
>>>>>
>>>>>
>>>>> 5. To do "Use case-insensitive comparison when creating output row type
>>>>> of a JoinRel." properly we require a new config parameter saying
>>>>> whether internal field names are case-sensitive.
>>>>>
>>>>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until
>>>>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute
>>>>> a fix for CALCITE-777! :)
>>>>>
>>>>> [Aman will have better idea]
>>>>>
>>>>> 6. Why do you need "Make public the constructor of
>>>>> SqlSumEmptyIsZeroAggFunction,"?
>>>>>
>>>>> [ It's used in Dril's ReduceAggregateRule. We should revisit this rule
>>>>> to see whether it's still required to keep this rule in stead of using
>>>>> Calcite's ReduceAggregateRule directly.
>>>>> ]
>>>>>
>>>>> 7. You don't need "Make certain push project constructors protected such
>>>>> that derived classes can access them." now we've parameterized
>>>>> ProjectFilterTransposeRule.
>>>>>
>>>>> [Agree.]
>>>>>
>>>>> 8. "DRILL-1455: Add return type-inference strategy for arithmetic
>>>>> operators when one of the arguments is ANY type." needs some
>>>>> discussion. What if both are ANY? How are other implementations of the
>>>>> operators (e.g. Calcite, Phoenix) going to implement the operator now
>>>>> that Drill's lax validation policy has allowed it in?
>>>>>
>>>>> [  Will discuss in Calcite list once we have the PR ready for this
>>> patch.
>>>>> ]
>>>>>
>>>>> 9. I can't see yet how the [StarColumn] changes could make it in. Will
>>>>> require some discussion. Note that CALCITE-546 has broken some of your
>>>>> assumptions.
>>>>>
>>>>> [ I know it would be big headache to resolve the changes related to
>>>>> star column for schema-on-read table.  I have to re-visit those
>>>>> changes, and probably start a discussion on Calcite list once I do a
>>>>> rebase and code cleanup
>>>>> ]
>>>>>
>>>>> Again, thanks for your feedback. I feel it's the right time for the
>>>>> Drill community to try to get off calcite fork!
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jinfeng
>>>>>
>>>>>
>>>>>> On Tue, Nov 10, 2015 at 1:51 PM, Julian Hyde <jhyde@apache.org>
wrote:
>>>>>> Going forward, I think you should start a "calcite first" policy
--
>>>>>> get changes accepted into Calcite, with tests of course, then
>>>>>> cherry-pick into Drill's branch. Recent changes like "Allow a MAP
>>>>>> literal type." should have done that. And especially changes where
you
>>>>>> could just parameterize planner rule. In other words, stop digging
the
>>>>>> hole deeper.
>>>>>>
>>>>>> I think you should identify and do the easy ones ASAP.
>>>>>>
>>>>>> Rather than creating a branch for each of these contributions, if
you
>>>>>> prefer you could create a pull request of the whole branch I think
>>>>>> that individual JIRAs can just reference a commit id that a Calcite
>>>>>> committer can cherry-pick.
>>>>>>
>>>>>> Now some remarks on specific changes.
>>>>>>
>>>>>> Why do you need "Add new method to ViewExpander interface to allow
>>>>>> passing SchemaRoot."? Most ViewExpander implementations have a root
>>>>>> schema. Note that the return type of that method has changed from
>>>>>> RelNode to RelRoot.
>>>>>>
>>>>>> "Match Logical Rel in ReduceExpressionRule." should be easy now we've
>>>>>> parameterized the rules, and maybe you can get rid of "Enable
>>>>>> inheriting from previously singleton filter and calc constant
>>>>>> reduction rules.".
>>>>>>
>>>>>> "Disable the nullability cast checking in Filter." is worth discussing
>>>>>> on the Calcite list.
>>>>>>
>>>>>> In "Return validated row type and validated SqlNode when call
>>>>>> Planner.validate()." you should return a Pair<SqlNode, RelDataType>.
>>>>>>
>>>>>> To do "Use case-insensitive comparison when creating output row type
>>>>>> of a JoinRel." properly we require a new config parameter saying
>>>>>> whether internal field names are case-sensitive.
>>>>>>
>>>>>> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until
>>>>>> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute
>>>>>> a fix for CALCITE-777! :)
>>>>>>
>>>>>> Why do you need "Make public the constructor of
>>>>> SqlSumEmptyIsZeroAggFunction,"?
>>>>>>
>>>>>> You don't need "Make certain push project constructors protected
such
>>>>>> that derived classes can access them." now we've parameterized
>>>>>> ProjectFilterTransposeRule.
>>>>>>
>>>>>> "DRILL-1455: Add return type-inference strategy for arithmetic
>>>>>> operators when one of the arguments is ANY type." needs some
>>>>>> discussion. What if both are ANY? How are other implementations of
the
>>>>>> operators (e.g. Calcite, Phoenix) going to implement the operator
now
>>>>>> that Drill's lax validation policy has allowed it in?
>>>>>>
>>>>>> I can't see yet how the [StarColumn] changes could make it in. Will
>>>>>> require some discussion. Note that CALCITE-546 has broken some of
your
>>>>>> assumptions.
>>>>>>
>>>>>> Julian
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Tue, Nov 10, 2015 at 12:27 PM, Jinfeng Ni <jni@apache.org>
wrote:
>>>>>>> Currently, Drill maintains a forked version of Calcite. Because
of
>>>>>>> this, it has been a big headache to rebase onto the latest Calcite
>>>>>>> master branch every time when Calcite makes a new release.
>>>>>>>
>>>>>>> During today's Drill hangout, we discuss ideas around how to
get Drill
>>>>>>> off the forked Calcite, by either pushing back patches in the
forked
>>>>>>> version to Calcite master branch, or trying to see if those patches
>>>>>>> are still required for Drill to work properly.
>>>>>>>
>>>>>>> There are 49 commits in forked Calcite whose base is Calcite
1.4.0
>>>>>>> release [1]. I put a spreadsheet for the current status of those
49
>>>>>>> commits, as far as I know [2]. Half of them are the patches that
were
>>>>>>> cherry-picked from Calcite master branch,  9 are just for version
>>>>>>> change, and the rest requires either push back or re-work in
Drill
>>>>>>> side.
>>>>>>>
>>>>>>> I think we probably have to spend considerable effort to get
this done
>>>>>>> ASAP. Otherwise, it will continue to be an issue for Drill.
>>>>>>>
>>>>>>> There are two separate work to be done:
>>>>>>> 1. Continue to rebase forked version onto latest Calcite master
>>>>>>> (Currently it's 1.5.0)
>>>>>>> 2. Push the patches in fork to Calcite master. Once all patches
are
>>>>>>> in, we could finally get rid of fork and rebasing efforts.
>>>>>>>
>>>>>>> The spreadsheet has a column for sign-up.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> (let me know if you have problem accessing the spreadsheet).
>>>>>>>
>>>>>>>
>>>>>>> [1] https://github.com/dremio/calcite/tree/DrillCalcite1.4.0
>>>>>>> [2]
>>>>>
>>> https://docs.google.com/spreadsheets/d/1Z0J5aMCBfqq_9SEl-LXsi_B8Ahaosygqke4cH6pPwmo/edit#gid=0
>>>>>
>>>

Mime
View raw message