Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F65F1806A for ; Thu, 7 Jan 2016 23:14:14 +0000 (UTC) Received: (qmail 47585 invoked by uid 500); 7 Jan 2016 23:14:14 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 47530 invoked by uid 500); 7 Jan 2016 23:14:14 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 47517 invoked by uid 99); 7 Jan 2016 23:14:13 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jan 2016 23:14:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7558BC0FFA for ; Thu, 7 Jan 2016 23:14:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BQXcD2SUP3hk for ; Thu, 7 Jan 2016 23:14:11 +0000 (UTC) Received: from mail-qk0-f174.google.com (mail-qk0-f174.google.com [209.85.220.174]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id ADBCF20511 for ; Thu, 7 Jan 2016 23:14:10 +0000 (UTC) Received: by mail-qk0-f174.google.com with SMTP id n135so193971492qka.2 for ; Thu, 07 Jan 2016 15:14:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=PySv2HeGcOgSW3/7BTa7nVrTjDYfwe65nU6aZTRHKds=; b=z5AzGl8lY5IfxbGKwKCuh5xEiGN7dhywHbb0gXsvE3qfByBNU3TvvsMF9zGkrjyo0t /uWhl9BB+Tq+wgtd6tgANrduQnFcpQfBdKPo/xDuP1WAMLuxsD20qJ88XW9PE2ovEsfp 7zHy5F0IY5y3P78ZhLGKcQ6cqeQR1fMmkOiAxZ7TO2Sq4VDHFkflRM7C6UgmirHCmmTA pygZ9yoUrnQt8XXxDyVu9g4ZZqWbEaGbyb1ipe/wAtygqScE7fBv1kavyQSxwAiU6W7A 9VcWHb3z50ixq7r7mlkCC0KohvsREZI2BaqooPCAqWUsvMDvkSUzE95V6MFtQTOGMv36 nh5w== MIME-Version: 1.0 X-Received: by 10.129.98.135 with SMTP id w129mr77494433ywb.191.1452208443857; Thu, 07 Jan 2016 15:14:03 -0800 (PST) Received: by 10.37.112.70 with HTTP; Thu, 7 Jan 2016 15:14:03 -0800 (PST) In-Reply-To: References: Date: Thu, 7 Jan 2016 15:14:03 -0800 Message-ID: Subject: Re: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses wrong probe/index branch From: Yingyi Bu To: dev@asterixdb.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1146eee86fa6f80528c6a002 --001a1146eee86fa6f80528c6a002 Content-Type: text/plain; charset=UTF-8 +1! >>3. We only try to use applicable indexes from the inner branch. So, if >>there are no applicable indexes from the inner branch, we abort >>transforming a join into an index-nested-loop join. "index-nested-loop join" -> "hybrid hash join"? Thanks! Yingyi On Thu, Jan 7, 2016 at 3:01 PM, Taewoo Kim wrote: > Hello dev, > > Regarding this issue (ASTERIXDB-1249), I would like to make an index-join > hint clarification. Let's start with an example query other than a > self-join query in this issue. > > for $c in dataset('Customers') > > for $o in dataset('Orders') > > where $c.cid /*+ indexnl */ = $o.cid > > order by $c.cid, $o.oid > > return {"cid":$c.cid, "oid": $o.oid} > > Right now, in the master branch, the first dataset (e.g., Customers) > becomes the outer branch and the second dataset (e.g., Orders) becomes the > inner branch. And, when the optimizer tries to honor the given indexnl hint > (transforming a join into an index-nested-loop join), if there are > applicable indexes from the inner branch (e.g., Orders), then it is going > to use one of those indexes. If there are no applicable indexes from the > inner branch, it tries to use indexes from the outer branch (e.g., > Customers). We are going to change the last part; we will not use indexes > from the outer branch. So, the following are refined rules for transforming > a join into an index-nested-loop join. > > 1. The first dataset in a join (the first parameter of the given join) > becomes the outer branch. > 2. The second dataset in a join (the second parameter of the given join) > becomes the inner branch. > 3. We only try to use applicable indexes from the inner branch. So, if > there are no applicable indexes from the inner branch, we abort > transforming a join into an index-nested-loop join. > 4. Variable order in the given join predicate is not important. It can be > either outer.fieldA = inner.fieldB or inner.fieldB = outer.fieldA. > > So, for the left-outer join and inner join altogether, the left subtree is > the probing side and the right subtree is the index side. So, this can be > applied to the self-join case, too just like the following query in this > issue. In the following query, $t1 becomes the outer and $t2 becomes the > inner. > > for $t1 in dataset('TweetMessages') > > for $t2 in dataset('TweetMessages') > > let $c := $t1.countA + 20 > > where $c /* +indexnl */= $t2.countB > > order by $t2.tweetid > > return {"tweetid2": $t2.tweetid, "count2":$t2.countB}; > > Thank you. Any opinion would be appreciated before I finalize this fix. > > Best, > Taewoo > > ---------- Forwarded message ---------- > From: Yingyi Bu (JIRA) > Date: Wed, Jan 6, 2016 at 10:23 AM > Subject: [jira] [Commented] (ASTERIXDB-1249) Self index join chooses wrong > probe/index branch > To: notifications@asterixdb.incubator.apache.org > > > > [ > > https://issues.apache.org/jira/browse/ASTERIXDB-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085992#comment-15085992 > ] > > Yingyi Bu commented on ASTERIXDB-1249: > -------------------------------------- > > That's right. Basically in AcessMethod implementations, e.g.: > > https://github.com/apache/incubator-asterixdb/blob/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/am/RTreeAccessMethod.java#L124 > > Choosing probe/index branch is only based on dataset names, instead of > being based on the join condition. > > > Self index join chooses wrong probe/index branch > > ------------------------------------------------ > > > > Key: ASTERIXDB-1249 > > URL: > https://issues.apache.org/jira/browse/ASTERIXDB-1249 > > Project: Apache AsterixDB > > Issue Type: Bug > > Components: Optimizer > > Reporter: Yingyi Bu > > Assignee: Taewoo Kim > > > > DDLs: > > {noformat} > > drop dataverse test if exists; > > create dataverse test; > > use dataverse test; > > create type TwitterUserType as closed { > > screen-name: string, > > lang: string, > > friends-count: int64, > > statuses-count: int64, > > name: string, > > followers-count: int64 > > } > > create type TweetMessageType as closed { > > tweetid: int64, > > user: TwitterUserType, > > sender-location: point, > > send-time: datetime, > > referred-topics: {{ string }}, > > message-text: string, > > countA: int64, > > countB: int64 > > } > > create dataset TweetMessages(TweetMessageType) > > primary key tweetid; > > create index twmSndLocIx on TweetMessages(sender-location) type rtree; > > create index msgCountAIx on TweetMessages(countA) type btree; > > create index msgCountBIx on TweetMessages(countB) type btree; > > create index msgTextIx on TweetMessages(message-text) type keyword; > > {noformat} > > Query: > > {noformat} > > for $t1 in dataset('TweetMessages') > > for $t2 in dataset('TweetMessages') > > let $n := create-circle($t1.sender-location, 0.5) > > where spatial-intersect($t2.sender-location, $n) > > order by $t2.tweetid > > return {"tweetid2":$t2.tweetid, "loc2":$t2.sender-location}; > > {noformat} > > Optimized plan: > > {noformat} > > distribute result [%0->$$10] > > -- DISTRIBUTE_RESULT |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > project ([$$10]) > > -- STREAM_PROJECT |PARTITIONED| > > assign [$$10] <- [function-call: asterix:closed-record-constructor, > Args:[AString: {tweetid2}, %0->$$15, AString: {loc2}, %0->$$13]] > > -- ASSIGN |PARTITIONED| > > exchange > > -- SORT_MERGE_EXCHANGE [$$15(ASC) ] |PARTITIONED| > > order (ASC, %0->$$15) > > -- STABLE_SORT [$$15(ASC)] |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > project ([$$13, $$15]) > > -- STREAM_PROJECT |PARTITIONED| > > select (function-call: asterix:spatial-intersect, > Args:[%0->$$13, function-call: asterix:create-circle, Args:[function-call: > asterix:field-access-by-index, Args:[%0->$$0, AInt32: {2}], ADouble: > {0.5}]]) > > -- STREAM_SELECT |PARTITIONED| > > project ([$$0, $$13, $$15]) > > -- STREAM_PROJECT |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > unnest-map [$$14, $$0] <- function-call: > asterix:index-search, Args:[AString: {TweetMessages}, AInt32: {0}, AString: > {test}, AString: {TweetMessages}, ABoolean: {true}, ABoolean: {false}, > ABoolean: {false}, AInt32: {1}, %0->$$27, AInt32: {1}, %0->$$27, TRUE, > TRUE, TRUE] > > -- BTREE_SEARCH |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > order (ASC, %0->$$27) > > -- STABLE_SORT [$$27(ASC)] |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > project ([$$27, $$13, $$15]) > > -- STREAM_PROJECT |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > > unnest-map [$$23, $$24, $$25, $$26, > $$27] <- function-call: asterix:index-search, Args:[AString: {twmSndLocIx}, > AInt32: {1}, AString: {test}, AString: {TweetMessages}, ABoolean: {true}, > ABoolean: {false}, ABoolean: {true}, AInt32: {4}, %0->$$19, %0->$$20, > %0->$$21, %0->$$22] > > -- RTREE_SEARCH |PARTITIONED| > > exchange > > -- BROADCAST_EXCHANGE |PARTITIONED| > > assign [$$19, $$20, $$21, $$22] <- > [function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2}, AInt32: > {0}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: {2}, > AInt32: {1}], function-call: asterix:create-mbr, Args:[%0->$$13, AInt32: > {2}, AInt32: {2}], function-call: asterix:create-mbr, Args:[%0->$$13, > AInt32: {2}, AInt32: {3}]] > > -- ASSIGN |PARTITIONED| > > project ([$$13, $$15]) > > -- STREAM_PROJECT |PARTITIONED| > > assign [$$13] <- > [function-call: asterix:field-access-by-index, Args:[%0->$$1, AInt32: {2}]] > > -- ASSIGN |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE > |PARTITIONED| > > data-scan []<-[$$15, $$1] > <- test:TweetMessages > > -- DATASOURCE_SCAN > |PARTITIONED| > > exchange > > -- ONE_TO_ONE_EXCHANGE > |PARTITIONED| > > empty-tuple-source > > -- EMPTY_TUPLE_SOURCE > |PARTITIONED| > > {noformat} > > The optimized plan is incorrect --- the index search doesn't use the > right join condition and hence the result is different from expected. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > --001a1146eee86fa6f80528c6a002--