drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: [DISCUSS] Get off Calcite Forked Version
Date Tue, 10 Nov 2015 22:55:56 GMT
Julian,

Thanks a lot for your detailed comments!

I totally agree with you on "calcite first" policy. That's why on MapR
side, I recommend people get changes in Calcite master first, then
cherry-pick back. Several recent patches follow this patten, as we
realized the pain to maintain this calcite fork.

I also agree that we should get them in ASAP and it is a good idea to
start with simple patch firsts.


1. Why do you need "Add new method to ViewExpander interface to allow
passing SchemaRoot."? Most ViewExpander implementations have a root
schema. Note that the return type of that method has changed from
RelNode to RelRoot.

 [
Venki might have better idea. As far as I remember, the patch was
merged to handle case where the view is created in a different schema
from the source schema. Drill uses Frameworks, not the regular code
path that Calcite master uses. Not sure if that's the cause of the
problem on Drill side.
]

2. "Match Logical Rel in ReduceExpressionRule." should be easy now we've
parameterized the rules, and maybe you can get rid of "Enable
inheriting from previously singleton filter and calc constant
reduction rules.".

[ Agree.]

3. "Disable the nullability cast checking in Filter." is worth discussing
on the Calcite list.

[ Will do that once I revise the patch with testcase.]

4. In "Return validated row type and validated SqlNode when call
Planner.validate()." you should return a Pair<SqlNode, RelDataType>.

[ The patch essentially returns Pair<SqlNode, RelDataType>, since it
put SqlNode and RelDataType in a wrapper. With Calcite master, seems
RelRoot probably is a better choice; it includes everything that we
want.
]


5. To do "Use case-insensitive comparison when creating output row type
of a JoinRel." properly we require a new config parameter saying
whether internal field names are case-sensitive.

Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until
CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute
a fix for CALCITE-777! :)

[Aman will have better idea]

6. Why do you need "Make public the constructor of
SqlSumEmptyIsZeroAggFunction,"?

[ It's used in Dril's ReduceAggregateRule. We should revisit this rule
to see whether it's still required to keep this rule in stead of using
Calcite's ReduceAggregateRule directly.
]

7. You don't need "Make certain push project constructors protected such
that derived classes can access them." now we've parameterized
ProjectFilterTransposeRule.

[Agree.]

8. "DRILL-1455: Add return type-inference strategy for arithmetic
operators when one of the arguments is ANY type." needs some
discussion. What if both are ANY? How are other implementations of the
operators (e.g. Calcite, Phoenix) going to implement the operator now
that Drill's lax validation policy has allowed it in?

[  Will discuss in Calcite list once we have the PR ready for this patch.
]

9. I can't see yet how the [StarColumn] changes could make it in. Will
require some discussion. Note that CALCITE-546 has broken some of your
assumptions.

[ I know it would be big headache to resolve the changes related to
star column for schema-on-read table.  I have to re-visit those
changes, and probably start a discussion on Calcite list once I do a
rebase and code cleanup
]

Again, thanks for your feedback. I feel it's the right time for the
Drill community to try to get off calcite fork!

Regards,

Jinfeng


On Tue, Nov 10, 2015 at 1:51 PM, Julian Hyde <jhyde@apache.org> wrote:
> Going forward, I think you should start a "calcite first" policy --
> get changes accepted into Calcite, with tests of course, then
> cherry-pick into Drill's branch. Recent changes like "Allow a MAP
> literal type." should have done that. And especially changes where you
> could just parameterize planner rule. In other words, stop digging the
> hole deeper.
>
> I think you should identify and do the easy ones ASAP.
>
> Rather than creating a branch for each of these contributions, if you
> prefer you could create a pull request of the whole branch I think
> that individual JIRAs can just reference a commit id that a Calcite
> committer can cherry-pick.
>
> Now some remarks on specific changes.
>
> Why do you need "Add new method to ViewExpander interface to allow
> passing SchemaRoot."? Most ViewExpander implementations have a root
> schema. Note that the return type of that method has changed from
> RelNode to RelRoot.
>
> "Match Logical Rel in ReduceExpressionRule." should be easy now we've
> parameterized the rules, and maybe you can get rid of "Enable
> inheriting from previously singleton filter and calc constant
> reduction rules.".
>
> "Disable the nullability cast checking in Filter." is worth discussing
> on the Calcite list.
>
> In "Return validated row type and validated SqlNode when call
> Planner.validate()." you should return a Pair<SqlNode, RelDataType>.
>
> To do "Use case-insensitive comparison when creating output row type
> of a JoinRel." properly we require a new config parameter saying
> whether internal field names are case-sensitive.
>
> Since "Don't use 'SumEmptyIsZero' (SUM0) window aggregate until
> CALCITE-777 is fixed." requires a fix to CALCITE-777 please contribute
> a fix for CALCITE-777! :)
>
> Why do you need "Make public the constructor of SqlSumEmptyIsZeroAggFunction,"?
>
> You don't need "Make certain push project constructors protected such
> that derived classes can access them." now we've parameterized
> ProjectFilterTransposeRule.
>
> "DRILL-1455: Add return type-inference strategy for arithmetic
> operators when one of the arguments is ANY type." needs some
> discussion. What if both are ANY? How are other implementations of the
> operators (e.g. Calcite, Phoenix) going to implement the operator now
> that Drill's lax validation policy has allowed it in?
>
> I can't see yet how the [StarColumn] changes could make it in. Will
> require some discussion. Note that CALCITE-546 has broken some of your
> assumptions.
>
> Julian
>
>
>
> On Tue, Nov 10, 2015 at 12:27 PM, Jinfeng Ni <jni@apache.org> wrote:
>> Currently, Drill maintains a forked version of Calcite. Because of
>> this, it has been a big headache to rebase onto the latest Calcite
>> master branch every time when Calcite makes a new release.
>>
>> During today's Drill hangout, we discuss ideas around how to get Drill
>> off the forked Calcite, by either pushing back patches in the forked
>> version to Calcite master branch, or trying to see if those patches
>> are still required for Drill to work properly.
>>
>> There are 49 commits in forked Calcite whose base is Calcite 1.4.0
>> release [1]. I put a spreadsheet for the current status of those 49
>> commits, as far as I know [2]. Half of them are the patches that were
>> cherry-picked from Calcite master branch,  9 are just for version
>> change, and the rest requires either push back or re-work in Drill
>> side.
>>
>> I think we probably have to spend considerable effort to get this done
>> ASAP. Otherwise, it will continue to be an issue for Drill.
>>
>> There are two separate work to be done:
>> 1. Continue to rebase forked version onto latest Calcite master
>> (Currently it's 1.5.0)
>> 2. Push the patches in fork to Calcite master. Once all patches are
>> in, we could finally get rid of fork and rebasing efforts.
>>
>> The spreadsheet has a column for sign-up.
>>
>> Thoughts?
>>
>> (let me know if you have problem accessing the spreadsheet).
>>
>>
>> [1] https://github.com/dremio/calcite/tree/DrillCalcite1.4.0
>> [2] https://docs.google.com/spreadsheets/d/1Z0J5aMCBfqq_9SEl-LXsi_B8Ahaosygqke4cH6pPwmo/edit#gid=0

Mime
View raw message