asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Fwd: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses wrong probe/index branch
Date Thu, 07 Jan 2016 23:01:14 GMT
Hello dev,

Regarding this issue (ASTERIXDB-1249), I would like to make an index-join
hint clarification. Let's start with an example query other than a
self-join query in this issue.

for $c in dataset('Customers')

for $o in dataset('Orders')

where $c.cid /*+ indexnl */ = $o.cid

order by $c.cid, $o.oid

return {"cid":$c.cid, "oid": $o.oid}

Right now, in the master branch, the first dataset (e.g., Customers)
becomes the outer branch and the second dataset (e.g., Orders) becomes the
inner branch. And, when the optimizer tries to honor the given indexnl hint
(transforming a join into an index-nested-loop join), if there are
applicable indexes from the inner branch (e.g., Orders), then it is going
to use one of those indexes. If there are no applicable indexes from the
inner branch, it tries to use indexes from the outer branch (e.g.,
Customers). We are going to change the last part; we will not use indexes
from the outer branch. So, the following are refined rules for transforming
a join into an index-nested-loop join.

1. The first dataset in a join (the first parameter of the given join)
becomes the outer branch.
2. The second dataset in a join (the second parameter of the given join)
becomes the inner branch.
3. We only try to use applicable indexes from the inner branch. So, if
there are no applicable indexes from the inner branch, we abort
transforming a join into an index-nested-loop join.
4. Variable order in the given join predicate is not important. It can be
either outer.fieldA = inner.fieldB or inner.fieldB = outer.fieldA.

So, for the left-outer join and inner join altogether, the left subtree is
the probing side and the right subtree is the index side. So, this can be
applied to the self-join case, too just like the following query in this
issue. In the following query, $t1 becomes the outer and $t2 becomes the
inner.

for $t1 in dataset('TweetMessages')

for $t2 in dataset('TweetMessages')

let $c := $t1.countA + 20

where $c /* +indexnl */= $t2.countB

order by $t2.tweetid

return {"tweetid2": $t2.tweetid, "count2":$t2.countB};

Thank you. Any opinion would be appreciated before I finalize this fix.

Best,
Taewoo

---------- Forwarded message ----------
From: Yingyi Bu (JIRA) <jira@apache.org>
Date: Wed, Jan 6, 2016 at 10:23 AM
Subject: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses wrong
probe/index branch
To: notifications@asterixdb.incubator.apache.org



    [
https://issues.apache.org/jira/browse/ASTERIXDB-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085992#comment-15085992
]

Yingyi Bu commented on ASTERIXDB-1249:
--------------------------------------

That's right. Basically in AcessMethod implementations, e.g.:
https://github.com/apache/incubator-asterixdb/blob/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/RTreeAccessMethod.java#L124

Choosing probe/index branch is only based on dataset names, instead of
being based on the join condition.

> Self index join chooses wrong probe/index branch
> ------------------------------------------------
>
>                 Key: ASTERIXDB-1249
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1249
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Optimizer
>            Reporter: Yingyi Bu
>            Assignee: Taewoo Kim
>
> DDLs:
> {noformat}
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type TwitterUserType as closed {
>     screen-name: string,
>     lang: string,
>     friends-count: int64,
>     statuses-count: int64,
>     name: string,
>     followers-count: int64
> }
> create type TweetMessageType as closed {
>     tweetid: int64,
>         user: TwitterUserType,
>         sender-location: point,
>     send-time: datetime,
>         referred-topics: {{ string }},
>     message-text: string,
>     countA: int64,
>     countB: int64
> }
> create dataset TweetMessages(TweetMessageType)
> primary key tweetid;
> create index twmSndLocIx on TweetMessages(sender-location) type rtree;
> create index msgCountAIx on TweetMessages(countA) type btree;
> create index msgCountBIx on TweetMessages(countB) type btree;
> create index msgTextIx on TweetMessages(message-text) type keyword;
> {noformat}
> Query:
> {noformat}
> for $t1 in dataset('TweetMessages')
> for $t2 in dataset('TweetMessages')
> let $n :=  create-circle($t1.sender-location, 0.5)
> where spatial-intersect($t2.sender-location, $n)
> order by $t2.tweetid
> return {"tweetid2":$t2.tweetid, "loc2":$t2.sender-location};
> {noformat}
> Optimized plan:
> {noformat}
> distribute result [%0->$$10]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$10])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$10] <- [function-call: asterix:closed-record-constructor,
Args:[AString: {tweetid2}, %0->$$15, AString: {loc2}, %0->$$13]]
>       -- ASSIGN  |PARTITIONED|
>         exchange
>         -- SORT_MERGE_EXCHANGE [$$15(ASC) ]  |PARTITIONED|
>           order (ASC, %0->$$15)
>           -- STABLE_SORT [$$15(ASC)]  |PARTITIONED|
>             exchange
>             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>               project ([$$13, $$15])
>               -- STREAM_PROJECT  |PARTITIONED|
>                 select (function-call: asterix:spatial-intersect,
Args:[%0->$$13, function-call: asterix:create-circle, Args:[function-call:
asterix:field-access-by-index, Args:[%0->$$0, AInt32: {2}], ADouble:
{0.5}]])
>                 -- STREAM_SELECT  |PARTITIONED|
>                   project ([$$0, $$13, $$15])
>                   -- STREAM_PROJECT  |PARTITIONED|
>                     exchange
>                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                       unnest-map [$$14, $$0] <- function-call:
asterix:index-search, Args:[AString: {TweetMessages}, AInt32: {0}, AString:
{test}, AString: {TweetMessages}, ABoolean: {true}, ABoolean: {false},
ABoolean: {false}, AInt32: {1}, %0->$$27, AInt32: {1}, %0->$$27, TRUE,
TRUE, TRUE]
>                       -- BTREE_SEARCH  |PARTITIONED|
>                         exchange
>                         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                           order (ASC, %0->$$27)
>                           -- STABLE_SORT [$$27(ASC)]  |PARTITIONED|
>                             exchange
>                             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                               project ([$$27, $$13, $$15])
>                               -- STREAM_PROJECT  |PARTITIONED|
>                                 exchange
>                                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                   unnest-map [$$23, $$24, $$25, $$26,
$$27] <- function-call: asterix:index-search, Args:[AString: {twmSndLocIx},
AInt32: {1}, AString: {test}, AString: {TweetMessages}, ABoolean: {true},
ABoolean: {false}, ABoolean: {true}, AInt32: {4}, %0->$$19, %0->$$20,
%0->$$21, %0->$$22]
>                                   -- RTREE_SEARCH  |PARTITIONED|
>                                     exchange
>                                     -- BROADCAST_EXCHANGE  |PARTITIONED|
>                                       assign [$$19, $$20, $$21, $$22] <-
[function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2}, AInt32:
{0}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2},
AInt32: {1}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32:
{2}, AInt32: {2}], function-call: asterix:create-mbr, Args:[%0->$$13,
AInt32: {2}, AInt32: {3}]]
>                                       -- ASSIGN  |PARTITIONED|
>                                         project ([$$13, $$15])
>                                         -- STREAM_PROJECT  |PARTITIONED|
>                                           assign [$$13] <-
[function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {2}]]
>                                           -- ASSIGN  |PARTITIONED|
>                                             exchange
>                                             -- ONE_TO_ONE_EXCHANGE
|PARTITIONED|
>                                               data-scan []<-[$$15, $$1]
<- test:TweetMessages
>                                               -- DATASOURCE_SCAN
|PARTITIONED|
>                                                 exchange
>                                                 -- ONE_TO_ONE_EXCHANGE
|PARTITIONED|
>                                                   empty-tuple-source
>                                                   -- EMPTY_TUPLE_SOURCE
|PARTITIONED|
> {noformat}
> The optimized plan is incorrect --- the index search doesn't use the
right join condition and hence the result is different from expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message