calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Chan <yuzhao....@gmail.com>
Subject Re: Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
Date Wed, 09 Oct 2019 12:45:30 GMT
I have the same feeling with Stamatis, we should have a full support for IN operator. Compared
with “IS NOT DISTINCT FROM”, I think they have much in common, we have did some work like
CALCITE-3174 to collapse the “IS NOT DISTINCT FROM” operator to make the whole operator
can be pushed down.

Although we make heavy use of range sets in DateRangeRules instead of the IN operator itself,
IN is a standard sql operator, I’m inclined to make it into the core operators.

Best,
Danny Chan
在 2019年10月8日 +0800 PM3:09,Stamatis Zampetakis <zabetak@gmail.com>,写道:
> It might be better to add a proper IN operator in RexCalls instead of
> something internal that does more or less the same thing.
> It is true that adds more paths in the code and thus requires some
> additional dev and further support but I think it is worth it.
> Many people so far expressed an interest to work on various cases involving
> an IN operator so it might not be long before
> we have full support for the IN operator.
>
> SqlToRelConverter can still decide to expand or not based on some criterion
> or property.
>
>
> On Tue, Oct 8, 2019 at 3:37 AM Julian Hyde <jhyde@apache.org> wrote:
>
> > A SqlCall to $HARD_IN will (by SqlToRelConverter) become a RexCall to
> > $HARD_IN, and then (by RelToSqlConverter) become a SqlCall to
> > $HARD_IN. $HARD_IN(x, v1, v2) would become (by SqlWriter) the SQL "x
> > IN (v1, v2)".
> >
> > At any point in this lifecycle, you could intercept and and simplify.
> >
> > On Mon, Oct 7, 2019 at 2:34 PM Haisheng Yuan <h.yuan@alibaba-inc.com>
> > wrote:
> > >
> > > Will the filter condition with “$HARD_IN” internal function be able to
> > pushed down and be recognized by the source SQL system, like Peter
> > mentioned?
> > >
> > > If not, we have to translate the internal function back to IN during
> > Rel2Sql phase. Otherwise, the data read from the source table can be much
> > larger.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Julian Hyde<jhyde@apache.org>
> > > 日 期:2019年10月08日 04:53:11
> > > 收件人:dev<dev@calcite.apache.org>
> > > 主 题:Re: [EXT] SqlRexConvertlet that Replicates "IN" Conversion Logic
> > >
> > > In
> > https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16946209
> > <
> > https://issues.apache.org/jira/browse/CALCITE-2792?focusedCommentId=16946209&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16946209>
> > I floated the idea of a “$HARD_IN” internal function that has the same
> > semantics as IN but is not expanded to ‘… = OR … = …’.
> > >
> > > I think it would be a useful tool, if used judiciously.
> > >
> > > Julian
> > >
> > >
> > > > On Oct 4, 2019, at 7:08 PM, Haisheng Yuan <h.yuan@alibaba-inc.com>
> > wrote:
> > > >
> > > > As a workaround, you can modify you SqlRexConverlet, create a RexCall
> > with balanced binary tree, e.g. (a=1 or a=2) or (a=3 or a=4), instead of a
> > flat RexCall with multiple operands, e.g. a=1 or a=2 or a=3 or a=4.
> > > > Because every OR RexCall has exactly 2 operands, it won't transform
> > into SqlCall with left deep tree.
> > > >
> > > > Let me know it works for you or not.
> > > >
> > > > - Haisheng
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人:Haisheng Yuan<h.yuan@alibaba-inc.com>
> > > > 日 期:2019年10月05日 07:37:04
> > > > 收件人:Peter Wicks (pwicks)<pwicks@micron.com>; dev@calcite.apache.org<
> > dev@calcite.apache.org>
> > > > 主 题:Re: RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > >
> > > > If you want to push the filter down to the source SQL sytem, then
> > transforming to a join won't help you either.
> > > >
> > > > The reason of stackoverflow for large ORs is the left deep binary
> > tree, we need to change it to balanced binary tree, to reduce the depth of
> > the call.
> > > >
> > > > I will open a pull request later.
> > > >
> > > > - Haisheng
> > > >
> > > > ------------------------------------------------------------------
> > > > 发件人:Peter Wicks (pwicks)<pwicks@micron.com>
> > > > 日 期:2019年10月04日 21:32:25
> > > > 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> > > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > >
> > > > Zoltan,
> > > >
> > > > Thanks for the suggestion. I actually tried doing a UDF first, and it
> > was also successful, sorry for not sharing those details earlier.
> > > > The problem with the UDF is that the predicates are not pushed down to
> > the source SQL system (by design), and this can result in a 100x increase
> > in the amount of data returned from the database. This data will be
> > correctly filtered by the UDF, but returning 100x the data makes it a lot
> > slower. So I was trying to push it down to the source server instead.
> > > >
> > > > What do you mean by, "I guess Calcite might probably won't be able to
> > do much with these ORs anyway..."? From my experiments I've seen two
> > results from passing in this many OR's:
> > > >
> > > > - If no other predicates are included in the query, then Calcite
> > succeeds! It leaves the OR's flat, (a=1 OR a=2 OR a=3 OR a=4)
> > > > - If additional predicates are included, then Calcite nests the OR
> > statements, leading to a stackoverflow for very large OR's, which is
> > CALCITE-2792, ((((a=1) OR a=2) OR a=3) OR a=4)
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > > -----Original Message-----
> > > > From: Zoltan Haindrich <kirk@rxd.hu>
> > > > Sent: Friday, October 4, 2019 12:38 AM
> > > > To: dev@calcite.apache.org; Haisheng Yuan <h.yuan@alibaba-inc.com>;
> > Peter Wicks (pwicks) <pwicks@micron.com>
> > > > Subject: Re: [EXT] Re: SqlRexConvertlet that Replicates "IN"
> > Conversion Logic
> > > >
> > > >
> > > > I think you might try another approach: introduce some UDF and use
> > your translation logic to call that - as the UDF will be opaque for calcite
> > it will be left alone.
> > > > I guess Calcite might probably won't be able to do much with these ORs
> > anyway...
> > > >
> > > >
> > > > On 10/3/19 11:26 PM, Haisheng Yuan wrote:
> > > > > I don't think this can be done in SqlRexConvertlet, which converts
> > SqlNode to RexNode.
> > > > > You might need to massage the SqlToRelConverter to create the RelNode
> > that you want.
> > > > >
> > > > > BTW, I still think we need RexNode for IN/ANY.
> > > > I also feel that there is some need for IN nodes; but there are some
> > good sides of not having it as well: like simplification handles them
> > better.
> > > >
> > > >
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > 发件人:Peter Wicks (pwicks)<pwicks@micron.com>
> > > > > 日 期:2019年10月04日 04:03:51
> > > > > 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> > > > > 主 题:RE: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > > >
> > > > > Haisheng,
> > > > >
> > > > > Yes, that is what I would like to do. Unfortunately, I’m not sure
how
> > to proceed to actually do that. I was hoping for a pointer to an example
> > that is similar?
> > > > >
> > > > > Thanks!
> > > > > Peter
> > > > >
> > > > > From: Haisheng Yuan <h.yuan@alibaba-inc.com>
> > > > > Sent: Thursday, October 3, 2019 1:35 PM
> > > > > To: Peter Wicks (pwicks) <pwicks@micron.com>; dev@calcite.apache.org
> > > > > Subject: [EXT] Re: SqlRexConvertlet that Replicates "IN" Conversion
> > Logic
> > > > >
> > > > > Currently Calcite doesn't have IN RexNode, only has IN SqlNode,
> > unfortunately.
> > > > >
> > > > > You can create a Values node with these authorization data, and make
> > a semi join with the table and Values you created.
> > > > >
> > > > > - Haisheng
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > 发件人:Peter Wicks (pwicks)<pwicks@micron.com<mailto:pwicks@micron.com>>
> > > > > 日 期:2019年10月04日 02:34:02
> > > > > 收件人:dev@calcite.apache.org<dev@calcite.apache.org<mailto:
> > dev@calcite.apache.org%3cdev@calcite.apache.org>>
> > > > > 主 题:SqlRexConvertlet that Replicates "IN" Conversion Logic
> > > > >
> > > > > A little detail about what I'm trying to do:
> > > > >
> > > > > I have an external API that contains authorization information on
a
> > per user basis. I want users to be able to include an operation in their
> > query that will filter data based on this authorization data.
> > > > >
> > > > > Using Calcite v1.16 / Java 1.8 / RHEL7, I built a class that
> > implements SqlRexConvertlet, and I am able to get this working. The user
> > includes in their predicate statement `custom_authorize(column)`, my
> > convertlet queries the API, gets the authorization rules, builds an OR
> > statement, and the results come back. This works sometimes, but other times
> > the OR condition becomes too large, and I run into CALCITE-2792:
> > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCALCITE-2792&amp;data=02%7C01%7Cpwicks%40micron.com%7C1920339cebed45b0bd5b08d748955d24%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C1%7C637057678693693555&amp;sdata=0tuXcWcjTHXMQQr%2BmKXBjrSTAW%2BqBG%2Fp3PsK8df2tfk%3D&amp;reserved=0,
> > which causes a stackoverflow and my query dies.
> > > > >
> > > > > So I tried converting to an IN statement, having read that IN
> > statements are automatically converted to a sub query join when the default
> > limit of 20 is exceeded. The problem is that this appears only to be true
> > for IN statements that are included in the initial query. IN statements
> > created as the result of a convertlet do not get modified, and are sent as
> > an IN statement, which results in a failure to parse the query. I looked at
> > how Calcite normally does this translation from IN to exists using a join,
> > but it depends on a lot of classes/instances that aren't available in the
> > SqlRexContext space. Is it possible to rewrite my IN statement to a
> > join/exists query like Calcite normally does?
> > > > >
> > > > > Also, am I doing things all wrong? Is there a better way to go about
> > this?
> > > > >
> > > > > Code Sample below is for the OR version, the commented code can be
> > swapped in to see how I was building the IN statement.
> > > > >
> > > > > @Override
> > > > > public RexNode convertCall(SqlRexContext cx, SqlCall call) {
> > > > > HashSet<String> keyList = null;
> > > > > try {
> > > > > keyList = new
> > Manager().getAllowedIDs(getContextInformation().getQueryUser());
> > > > > } catch (SQLException e) {
> > > > > e.printStackTrace();
> > > > > }
> > > > >
> > > > > final RexBuilder rexBuilder = cx.getRexBuilder();
> > > > > final RexNode column = cx.convertExpression(call.operand(0));
> > > > >
> > > > > final List<RexNode> nodes = new ArrayList<>();
> > > > > for(String s: keyList) {
> > > > > nodes.add(rexBuilder.makeCall(EQUALS, column,
> > rexBuilder.makeLiteral(s)));
> > > > > //nodes.add(rexBuilder.makeLiteral(s));
> > > > > }
> > > > >
> > > > > final RexNode in = rexBuilder.makeCall(SqlStdOperatorTable.OR, nodes);
> > > > > //final RexNode in = inBuilder(rexBuilder, column, nodes.toArray(new
> > RexNode[0]));
> > > > >
> > > > > return in;
> > > > > }
> > > > >
> > > > > protected RexNode inBuilder(RexBuilder rexBuilder, RexNode node,
> > RexNode... nodes) {
> > > > > return rexBuilder.makeCall(SqlStdOperatorTable.IN,
> > > > > ImmutableList.<RexNode>builder().add(node).add(nodes).build());
> > > > > }
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > >
> > > >
> > >
> > >
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message