asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pouria (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results
Date Thu, 19 Nov 2015 19:33:11 GMT
Pouria created ASTERIXDB-1186:
---------------------------------

             Summary: Index Join Hint on an index with composite keys causes query to return
incorrect results
                 Key: ASTERIXDB-1186
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: AsterixDB, Optimizer
         Environment: AsterixDB 0.8.7-SNAPSHOT
            Reporter: Pouria
            Priority: Critical


If one of the attributes in the join predicate is a *composite* PK (consists of more than
one attribute), if the Index Join hint is used the query plan includes wrong exchange (partitioning)
which results in wrong results (only a subset of the correct total results will be returned).
The root cause of it is that the records from the PK side (which is composite) are hash-partitioned
on a "combination" of hash values of all attributes in the PK. As a result the records of
the non-PK side need to be sent to "all" partitions, but the plan contains the wrong exchange
and the runtime wont send them to all partitions. 
Below is an example:

> Here is DDL:
create type LineItemType as {
  l_orderkey: int64,
  l_partkey: int64,
  l_suppkey: int64,
  l_linenumber: int32,
  l_quantity: int32,
  l_extendedprice: double,
  l_discount: double,
  l_tax: double,
  l_returnflag: string,
  l_linestatus: string,
  l_shipdate: string,
  l_commitdate: string,
  l_receiptdate: string,
  l_shipinstruct: string,
  l_shipmode: string,
  l_comment: string
}

create type OrderType as {
  o_orderkey: int64,
  o_custkey: int64,
  o_orderstatus: string,
  o_totalprice: double,
  o_orderdate: string,
  o_orderpriority: string,
  o_clerk: string,
  o_shippriority: int32,
  o_comment: string
}

create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
create dataset Orders(OrderType)  primary key o_orderkey;

> The following query returns wrong (a subset) of results:

for $o in dataset('Orders')
for $l in dataset('LineItem')
where 
$l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
return{
"o_orderkey": $o.o_orderkey,
"l_orderkey": $l.l_orderkey
}

Here is the plan - As you can see the Orders record are One-on-One exchanged (rather than
broadcast). The issue is that the l_orderkey is not the partitioning attribute of LineItem,
and matching Orders for a specific order_key can be in any partition (depending on their l_linenumber
value - which is the 2nd component of PK in LineItem)

distribute result [%0->$$13]
-- DISTRIBUTE_RESULT  |UNPARTITIONED|
  exchange 
  -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
    aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
    -- AGGREGATE  |UNPARTITIONED|
      exchange 
      -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
        aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
        -- AGGREGATE  |PARTITIONED|
          project ([$$9])
          -- STREAM_PROJECT  |PARTITIONED|
            assign [$$9] <- [function-call: asterix:closed-record-constructor, Args:[AString:
{o_orderkey}, %0->$$16, AString: {l_orderkey}, %0->$$17]]
            -- ASSIGN  |PARTITIONED|
              project ([$$17, $$16])
              -- STREAM_PROJECT  |PARTITIONED|
                exchange 
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  unnest-map [$$17, $$18, $$1] <- function-call: asterix:index-search,
Args:[AString: {LineItem}, AInt32: {0}, AString: {dummy}, AString: {LineItem}, ABoolean: {true},
ABoolean: {false}, ABoolean: {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE,
TRUE, TRUE]
                  -- BTREE_SEARCH  |PARTITIONED|
                    exchange 
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      project ([$$16])
                      -- STREAM_PROJECT  |PARTITIONED|
                        exchange 
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          data-scan []<-[$$16, $$0] <- dummy:Orders
                          -- DATASOURCE_SCAN  |PARTITIONED|
                            exchange 
                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                              empty-tuple-source
                              -- EMPTY_TUPLE_SOURCE  |PARTITIONED|



 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message