Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CA8CE200B78 for ; Fri, 19 Aug 2016 01:20:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C912D160AB7; Thu, 18 Aug 2016 23:20:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E6D54160AAE for ; Fri, 19 Aug 2016 01:20:25 +0200 (CEST) Received: (qmail 48653 invoked by uid 500); 18 Aug 2016 23:20:25 -0000 Mailing-List: contact notifications-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list notifications@asterixdb.apache.org Received: (qmail 48644 invoked by uid 99); 18 Aug 2016 23:20:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2016 23:20:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B6EEBC214D for ; Thu, 18 Aug 2016 23:20:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.739 X-Spam-Level: X-Spam-Status: No, score=-3.739 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.519] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BJOnRRPhzIR3 for ; Thu, 18 Aug 2016 23:20:23 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id 9E9935F5A1 for ; Thu, 18 Aug 2016 23:20:22 +0000 (UTC) Received: (qmail 48571 invoked by uid 99); 18 Aug 2016 23:20:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2016 23:20:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id AF4BF2C02A8 for ; Thu, 18 Aug 2016 23:20:21 +0000 (UTC) Date: Thu, 18 Aug 2016 23:20:21 +0000 (UTC) From: "Taewoo Kim (JIRA)" To: notifications@asterixdb.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 18 Aug 2016 23:20:27 -0000 [ https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427356#comment-15427356 ] Taewoo Kim commented on ASTERIXDB-1186: --------------------------------------- The result on the current master is Interesting: the first query generates the correct (LineItem - outer, Orders - inner) result while the second (LineItem - inner, Orders - outer) generates wrong result. In summary, if LineItem is outer, it correctly sends the data. The problem of exchange written here does not exist in the current master. However, I am not sure why the second one doesn't generate the correct result. Does anyone have an idea? #1. LineItem - outer, Orders - inner {code} for $l in dataset('LineItem') for $o in dataset('Orders') where $l.l_orderkey /*+ indexnl */ = $o.o_orderkey return { "o_orderkey": $o.o_orderkey, "l_orderkey": $l.l_orderkey } {code} {code} distribute result [%0->$$7] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$7]) -- STREAM_PROJECT |PARTITIONED| assign [$$7] <- [function-call: asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$14, AString: {l_orderkey}, %0->$$12]] -- ASSIGN |PARTITIONED| project ([$$12, $$14]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$14, $$1] <- function-call: asterix:index-search, Args:[AString: {Orders}, AInt32: {0}, AString: {test}, AString: {Orders}, ABoolean: {true}, ABoolean: {true}, AInt32: {1}, %0->$$12, AInt32: {1}, %0->$$12, TRUE, TRUE, TRUE] -- BTREE_SEARCH |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$12]) -- STREAM_PROJECT |PARTITIONED| exchange -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$12(ASC), $$13(ASC)] HASH:[$$12] |PARTITIONED| project ([$$12, $$13]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$12, $$13, $$0] <- test:LineItem -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| {code} #2. LineItem - inner, Orders - outer {code} for $o in dataset('Orders') for $l in dataset('LineItem') where $l.l_orderkey /*+ indexnl */ = $o.o_orderkey return { "o_orderkey": $o.o_orderkey, "l_orderkey": $l.l_orderkey } {code} {code} distribute result [%0->$$7] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$7]) -- STREAM_PROJECT |PARTITIONED| assign [$$7] <- [function-call: asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$12, AString: {l_orderkey}, %0->$$13]] -- ASSIGN |PARTITIONED| project ([$$12, $$13]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$13, $$14, $$1] <- function-call: asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: {test}, AString: {LineItem}, ABoolean: {true}, ABoolean: {true}, AInt32: {1}, %0->$$12, AInt32: {1}, %0->$$12, TRUE, TRUE, TRUE] -- BTREE_SEARCH |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$12]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$12, $$0] <- test:Orders -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| {code} > Index Join Hint on an index with composite keys causes query to return incorrect results > ---------------------------------------------------------------------------------------- > > Key: ASTERIXDB-1186 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186 > Project: Apache AsterixDB > Issue Type: Bug > Components: AsterixDB, Optimizer > Environment: AsterixDB 0.8.7-SNAPSHOT > Reporter: Pouria > Assignee: Taewoo Kim > Priority: Critical > Labels: soon > > If one of the attributes in the join predicate is a *composite* PK (consists of more than one attribute), if the Index Join hint is used the query plan includes wrong exchange (partitioning) which results in wrong results (only a subset of the correct total results will be returned). > The root cause of it is that the records from the PK side (which is composite) are hash-partitioned on a "combination" of hash values of all attributes in the PK. As a result the records of the non-PK side need to be sent to "all" partitions, but the plan contains the wrong exchange and the runtime wont send them to all partitions. > Below is an example: > > Here is DDL: > create type LineItemType as { > l_orderkey: int64, > l_partkey: int64, > l_suppkey: int64, > l_linenumber: int32, > l_quantity: int32, > l_extendedprice: double, > l_discount: double, > l_tax: double, > l_returnflag: string, > l_linestatus: string, > l_shipdate: string, > l_commitdate: string, > l_receiptdate: string, > l_shipinstruct: string, > l_shipmode: string, > l_comment: string > } > create type OrderType as { > o_orderkey: int64, > o_custkey: int64, > o_orderstatus: string, > o_totalprice: double, > o_orderdate: string, > o_orderpriority: string, > o_clerk: string, > o_shippriority: int32, > o_comment: string > } > create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber; > create dataset Orders(OrderType) primary key o_orderkey; > > The following query returns wrong (a subset) of results: > for $o in dataset('Orders') > for $l in dataset('LineItem') > where > $l.l_orderkey /*+ indexnl */ = $o.o_orderkey > return{ > "o_orderkey": $o.o_orderkey, > "l_orderkey": $l.l_orderkey > } > Here is the plan - As you can see the Orders record are One-on-One exchanged (rather than broadcast). The issue is that the l_orderkey is not the partitioning attribute of LineItem, and matching Orders for a specific order_key can be in any partition (depending on their l_linenumber value - which is the 2nd component of PK in LineItem) > distribute result [%0->$$13] > -- DISTRIBUTE_RESULT |UNPARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| > aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]] > -- AGGREGATE |UNPARTITIONED| > exchange > -- RANDOM_MERGE_EXCHANGE |PARTITIONED| > aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]] > -- AGGREGATE |PARTITIONED| > project ([$$9]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$9] <- [function-call: asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, AString: {l_orderkey}, %0->$$17]] > -- ASSIGN |PARTITIONED| > project ([$$17, $$16]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > unnest-map [$$17, $$18, $$1] <- function-call: asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE] > -- BTREE_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$16]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > data-scan []<-[$$16, $$0] <- dummy:Orders > -- DATASOURCE_SCAN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED| > -- This message was sent by Atlassian JIRA (v6.3.4#6332)