asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ASTERIXDB-1246) Unnecessary decor variables of a group-by are not removed until PushProjectDownRule is fired.
Date Sun, 03 Jan 2016 02:48:39 GMT
Taewoo Kim created ASTERIXDB-1246:
-------------------------------------

             Summary: Unnecessary decor variables of a group-by are not removed until PushProjectDownRule
is fired.
                 Key: ASTERIXDB-1246
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1246
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Taewoo Kim
            Assignee: Taewoo Kim


Unnecessary decor variables of a group-by is not removed until PushProjectDownRule is fired.

Currently, group-by for a subplan is introduced when IntroduceGroupByForSubplanRule is fired.
At this time, decor variables for the new group-by operator are also added based on the variable
usage after the new group-by operator.

After this rule, other optimizations might make decor variables unnecessary. One example is
that an assign after group-by can be moved before the group-by operator so that a record variable
(e.g., $$0) that is required for the given assign does not need to be passed through the group-by
operator. These unnecessary decor variables will be removed only when PushProjectDownRule
is fired. 

As the rule name suggests, PushProjectDownRule rule will be fired only when we have a project
operator in the plan. Currently in my branch (index-only plan branch), this affects the IntroduceSelectAccessMethodRule,
which transforms a plan into indexes-utilization plan. In this rule, it checks whether the
given plan is an index-only plan by checking variables used after a SELECT operator. If only
secondary key and/or primary key are used, then the given plan is an index-only plan and we
can use a secodnary-index search to return SK and PK. 

The issue is that IntroduceSelectAccessMethodRule is fired before PushProjectDownRule and
generally there is no project is introduced in the plan before IntroduceSelectAccessMethodRule.
So, these unnecessary decor variables are not used; however, they still sit in the plan so
that the optimizer wrongly decides the given plan as a non-index-only plan. The following
is an example query. If we have a secondary index on count1 (PK:tweetid), then this should
be qualified as an index-only plan for the outer branch. In fact, it doesn't because of unnecessary
decor variables that still sit after some optimizations.

for $t1 in dataset('TweetMessages')
where $t1.countA > 0
return {
"tweetid1": $t1.tweetid,
"count1":$t1.countA,
"t2info": for $t2 in dataset('TweetMessages')
                        where $t1.countA /* +indexnl */= $t2.tweetid
                        return {"tweetid2": $t2.tweetid,
                                "count2": $t2.countB}
}

We can separate PushProjectDownRule rule into two rules: push project down and clean decor
variables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message