asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Re: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses wrong probe/index branch
Date Thu, 07 Jan 2016 23:22:07 GMT
Thanks Yingyi. Yes. If there is an equality condition and if we can't
transform a join into an index-nested loop join, then a hybrid hash join
will be used.

Best,
Taewoo

On Thu, Jan 7, 2016 at 3:14 PM, Yingyi Bu <buyingyi@gmail.com> wrote:

> +1!
> >>3. We only try to use applicable indexes from the inner branch. So, if
> >>there are no applicable indexes from the inner branch, we abort
> >>transforming a join into an index-nested-loop join.
> "index-nested-loop join" -> "hybrid hash join"?
>
> Thanks!
>
> Yingyi
>
>
>
> On Thu, Jan 7, 2016 at 3:01 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
>
> > Hello dev,
> >
> > Regarding this issue (ASTERIXDB-1249), I would like to make an index-join
> > hint clarification. Let's start with an example query other than a
> > self-join query in this issue.
> >
> > for $c in dataset('Customers')
> >
> > for $o in dataset('Orders')
> >
> > where $c.cid /*+ indexnl */ = $o.cid
> >
> > order by $c.cid, $o.oid
> >
> > return {"cid":$c.cid, "oid": $o.oid}
> >
> > Right now, in the master branch, the first dataset (e.g., Customers)
> > becomes the outer branch and the second dataset (e.g., Orders) becomes
> the
> > inner branch. And, when the optimizer tries to honor the given indexnl
> hint
> > (transforming a join into an index-nested-loop join), if there are
> > applicable indexes from the inner branch (e.g., Orders), then it is going
> > to use one of those indexes. If there are no applicable indexes from the
> > inner branch, it tries to use indexes from the outer branch (e.g.,
> > Customers). We are going to change the last part; we will not use indexes
> > from the outer branch. So, the following are refined rules for
> transforming
> > a join into an index-nested-loop join.
> >
> > 1. The first dataset in a join (the first parameter of the given join)
> > becomes the outer branch.
> > 2. The second dataset in a join (the second parameter of the given join)
> > becomes the inner branch.
> > 3. We only try to use applicable indexes from the inner branch. So, if
> > there are no applicable indexes from the inner branch, we abort
> > transforming a join into an index-nested-loop join.
> > 4. Variable order in the given join predicate is not important. It can be
> > either outer.fieldA = inner.fieldB or inner.fieldB = outer.fieldA.
> >
> > So, for the left-outer join and inner join altogether, the left subtree
> is
> > the probing side and the right subtree is the index side. So, this can be
> > applied to the self-join case, too just like the following query in this
> > issue. In the following query, $t1 becomes the outer and $t2 becomes the
> > inner.
> >
> > for $t1 in dataset('TweetMessages')
> >
> > for $t2 in dataset('TweetMessages')
> >
> > let $c := $t1.countA + 20
> >
> > where $c /* +indexnl */= $t2.countB
> >
> > order by $t2.tweetid
> >
> > return {"tweetid2": $t2.tweetid, "count2":$t2.countB};
> >
> > Thank you. Any opinion would be appreciated before I finalize this fix.
> >
> > Best,
> > Taewoo
> >
> > ---------- Forwarded message ----------
> > From: Yingyi Bu (JIRA) <jira@apache.org>
> > Date: Wed, Jan 6, 2016 at 10:23 AM
> > Subject: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses
> wrong
> > probe/index branch
> > To: notifications@asterixdb.incubator.apache.org
> >
> >
> >
> >     [
> >
> >
> https://issues.apache.org/jira/browse/ASTERIXDB-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085992#comment-15085992
> > ]
> >
> > Yingyi Bu commented on ASTERIXDB-1249:
> > --------------------------------------
> >
> > That's right. Basically in AcessMethod implementations, e.g.:
> >
> >
> https://github.com/apache/incubator-asterixdb/blob/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/RTreeAccessMethod.java#L124
> >
> > Choosing probe/index branch is only based on dataset names, instead of
> > being based on the join condition.
> >
> > > Self index join chooses wrong probe/index branch
> > > ------------------------------------------------
> > >
> > >                 Key: ASTERIXDB-1249
> > >                 URL:
> > https://issues.apache.org/jira/browse/ASTERIXDB-1249
> > >             Project: Apache AsterixDB
> > >          Issue Type: Bug
> > >          Components: Optimizer
> > >            Reporter: Yingyi Bu
> > >            Assignee: Taewoo Kim
> > >
> > > DDLs:
> > > {noformat}
> > > drop dataverse test if exists;
> > > create dataverse test;
> > > use dataverse test;
> > > create type TwitterUserType as closed {
> > >     screen-name: string,
> > >     lang: string,
> > >     friends-count: int64,
> > >     statuses-count: int64,
> > >     name: string,
> > >     followers-count: int64
> > > }
> > > create type TweetMessageType as closed {
> > >     tweetid: int64,
> > >         user: TwitterUserType,
> > >         sender-location: point,
> > >     send-time: datetime,
> > >         referred-topics: {{ string }},
> > >     message-text: string,
> > >     countA: int64,
> > >     countB: int64
> > > }
> > > create dataset TweetMessages(TweetMessageType)
> > > primary key tweetid;
> > > create index twmSndLocIx on TweetMessages(sender-location) type rtree;
> > > create index msgCountAIx on TweetMessages(countA) type btree;
> > > create index msgCountBIx on TweetMessages(countB) type btree;
> > > create index msgTextIx on TweetMessages(message-text) type keyword;
> > > {noformat}
> > > Query:
> > > {noformat}
> > > for $t1 in dataset('TweetMessages')
> > > for $t2 in dataset('TweetMessages')
> > > let $n :=  create-circle($t1.sender-location, 0.5)
> > > where spatial-intersect($t2.sender-location, $n)
> > > order by $t2.tweetid
> > > return {"tweetid2":$t2.tweetid, "loc2":$t2.sender-location};
> > > {noformat}
> > > Optimized plan:
> > > {noformat}
> > > distribute result [%0->$$10]
> > > -- DISTRIBUTE_RESULT  |PARTITIONED|
> > >   exchange
> > >   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> > >     project ([$$10])
> > >     -- STREAM_PROJECT  |PARTITIONED|
> > >       assign [$$10] <- [function-call:
> asterix:closed-record-constructor,
> > Args:[AString: {tweetid2}, %0->$$15, AString: {loc2}, %0->$$13]]
> > >       -- ASSIGN  |PARTITIONED|
> > >         exchange
> > >         -- SORT_MERGE_EXCHANGE [$$15(ASC) ]  |PARTITIONED|
> > >           order (ASC, %0->$$15)
> > >           -- STABLE_SORT [$$15(ASC)]  |PARTITIONED|
> > >             exchange
> > >             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> > >               project ([$$13, $$15])
> > >               -- STREAM_PROJECT  |PARTITIONED|
> > >                 select (function-call: asterix:spatial-intersect,
> > Args:[%0->$$13, function-call: asterix:create-circle,
> Args:[function-call:
> > asterix:field-access-by-index, Args:[%0->$$0, AInt32: {2}], ADouble:
> > {0.5}]])
> > >                 -- STREAM_SELECT  |PARTITIONED|
> > >                   project ([$$0, $$13, $$15])
> > >                   -- STREAM_PROJECT  |PARTITIONED|
> > >                     exchange
> > >                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> > >                       unnest-map [$$14, $$0] <- function-call:
> > asterix:index-search, Args:[AString: {TweetMessages}, AInt32: {0},
> AString:
> > {test}, AString: {TweetMessages}, ABoolean: {true}, ABoolean: {false},
> > ABoolean: {false}, AInt32: {1}, %0->$$27, AInt32: {1}, %0->$$27, TRUE,
> > TRUE, TRUE]
> > >                       -- BTREE_SEARCH  |PARTITIONED|
> > >                         exchange
> > >                         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> > >                           order (ASC, %0->$$27)
> > >                           -- STABLE_SORT [$$27(ASC)]  |PARTITIONED|
> > >                             exchange
> > >                             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> > >                               project ([$$27, $$13, $$15])
> > >                               -- STREAM_PROJECT  |PARTITIONED|
> > >                                 exchange
> > >                                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> > >                                   unnest-map [$$23, $$24, $$25, $$26,
> > $$27] <- function-call: asterix:index-search, Args:[AString:
> {twmSndLocIx},
> > AInt32: {1}, AString: {test}, AString: {TweetMessages}, ABoolean: {true},
> > ABoolean: {false}, ABoolean: {true}, AInt32: {4}, %0->$$19, %0->$$20,
> > %0->$$21, %0->$$22]
> > >                                   -- RTREE_SEARCH  |PARTITIONED|
> > >                                     exchange
> > >                                     -- BROADCAST_EXCHANGE
> |PARTITIONED|
> > >                                       assign [$$19, $$20, $$21, $$22]
> <-
> > [function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2}, AInt32:
> > {0}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2},
> > AInt32: {1}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32:
> > {2}, AInt32: {2}], function-call: asterix:create-mbr, Args:[%0->$$13,
> > AInt32: {2}, AInt32: {3}]]
> > >                                       -- ASSIGN  |PARTITIONED|
> > >                                         project ([$$13, $$15])
> > >                                         -- STREAM_PROJECT
> |PARTITIONED|
> > >                                           assign [$$13] <-
> > [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32:
> {2}]]
> > >                                           -- ASSIGN  |PARTITIONED|
> > >                                             exchange
> > >                                             -- ONE_TO_ONE_EXCHANGE
> > |PARTITIONED|
> > >                                               data-scan []<-[$$15, $$1]
> > <- test:TweetMessages
> > >                                               -- DATASOURCE_SCAN
> > |PARTITIONED|
> > >                                                 exchange
> > >                                                 -- ONE_TO_ONE_EXCHANGE
> > |PARTITIONED|
> > >                                                   empty-tuple-source
> > >                                                   -- EMPTY_TUPLE_SOURCE
> > |PARTITIONED|
> > > {noformat}
> > > The optimized plan is incorrect --- the index search doesn't use the
> > right join condition and hence the result is different from expected.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message