asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Date Tue, 10 Nov 2015 23:15:41 GMT
Actually, I’m still confused with the “cardinality” here. Isn’t the cardinality of
$ps is 5? 
>> let $ps := ["b","a", "b","c","c”]


> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> 
> Jianfeng,
> 
> The results of the query is correct.
> The cardinality of returned results should be the same as the number of
> input binding tuples for $p.
> 
> Best,
> Yingyi
> 
> 
> On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <jira@apache.org>
> wrote:
> 
>> Jianfeng Jia created ASTERIXDB-1168:
>> ---------------------------------------
>> 
>>             Summary: Should not sort&group after an OrderedList left-join
>> with a dataset
>>                 Key: ASTERIXDB-1168
>>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
>>             Project: Apache AsterixDB
>>          Issue Type: Bug
>>          Components: Optimizer
>>            Reporter: Jianfeng Jia
>> 
>> 
>> Hi,
>> Here is the context for this issue,  I wanted to lookup some records in
>> the DB through REST API, and I wanted to lookup in a batch way. Then I
>> packaged the "keys" into an OrderdList and expected a left-out join would
>> give me all matching records that consistent with query order. However, the
>> result was re-sorted and grouped, which confused the client side response
>> handler.
>> 
>> Here is the synthetic query that emulates the similar use case:
>> ---------------------------------------------------------------------------
>> drop dataverse test if exists;
>> create dataverse test;
>> 
>> use dataverse test;
>> 
>> create type TType as closed {
>>  id: int64,
>>  content: string
>> }
>> 
>> create dataset TData (TType) primary key id;
>> 
>> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content":
>> "b"}, {"id":3, "content":"c"}])
>> 
>> // now let's query on
>> let $ps := ["b","a", "b","c","c"]
>> 
>> for $p in $ps
>> return { "p":$p,
>> "match": for $x in dataset TData where $x.content = $p return $x.id
>> }
>> ---------------------------------------------------------------------------
>> 
>> What I expected is following:
>> ---------------------------------------------------------------------------
>> [ { "p": "b", "match": [ 2 ] }
>> , { "p": "a", "match": [ 1 ] }
>> , { "p": "b", "match": [ 2 ] }
>> , { "p": "c", "match": [ 3 ] }
>> , { "p": "c", "match": [ 3 ] }
>> ]
>> ---------------------------------------------------------------------------
>> 
>> The returned result is following, which is aggregated and re-sorted.
>> ---------------------------------------------------------------------------
>> [ { "p": "a", "match": [ 1 ] }
>> , { "p": "b", "match": [ 2, 2 ] }
>> , { "p": "c", "match": [ 3, 3 ] }
>> ]
>> ---------------------------------------------------------------------------
>> 
>> The optimized logical plan is following:
>> ---------------------------------------------------------------------------
>> distribute result [%0->$$4]
>> -- DISTRIBUTE_RESULT  |PARTITIONED|
>>  exchange
>>  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>    project ([$$4])
>>    -- STREAM_PROJECT  |PARTITIONED|
>>      assign [$$4] <- [function-call: asterix:closed-record-constructor,
>> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
>>      -- ASSIGN  |PARTITIONED|
>>        project ([$$1, $$9])
>>        -- STREAM_PROJECT  |PARTITIONED|
>>          exchange
>>          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>            group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>>                      aggregate [$$9] <- [function-call: asterix:listify,
>> Args:[%0->$$10]]
>>                      -- AGGREGATE  |LOCAL|
>>                        select (function-call: algebricks:not,
>> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
>>                        -- STREAM_SELECT  |LOCAL|
>>                          nested tuple source
>>                          -- NESTED_TUPLE_SOURCE  |LOCAL|
>>                   }
>>            -- PRE_CLUSTERED_GROUP_BY[$$12, $$13]  |PARTITIONED|
>>              exchange
>>              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                order (ASC, %0->$$12) (ASC, %0->$$13)
>>                -- STABLE_SORT [$$12(ASC), $$13(ASC)]  |PARTITIONED|
>>                  exchange
>>                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                    project ([$$10, $$11, $$12, $$13])
>>                    -- STREAM_PROJECT  |PARTITIONED|
>>                      exchange
>>                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                        left outer join (function-call: algebricks:eq,
>> Args:[%0->$$14, %0->$$13])
>>                        -- HYBRID_HASH_JOIN [$$13][$$14]  |PARTITIONED|
>>                          exchange
>>                          -- HASH_PARTITION_EXCHANGE [$$13]  |PARTITIONED|
>>                            unnest $$13 <- function-call:
>> asterix:scan-collection, Args:[%0->$$12]
>>                            -- UNNEST  |UNPARTITIONED|
>>                              assign [$$12] <- [AOrderedList: [ AString:
>> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
>>                              -- ASSIGN  |UNPARTITIONED|
>>                                empty-tuple-source
>>                                -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
>>                          exchange
>>                          -- HASH_PARTITION_EXCHANGE [$$14]  |PARTITIONED|
>>                            project ([$$10, $$11, $$14])
>>                            -- STREAM_PROJECT  |PARTITIONED|
>>                              assign [$$11, $$14] <- [TRUE, function-call:
>> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
>>                              -- ASSIGN  |PARTITIONED|
>>                                exchange
>>                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                                  data-scan []<-[$$10, $$2] <- test:TData
>>                                  -- DATASOURCE_SCAN  |PARTITIONED|
>>                                    exchange
>>                                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>                                      empty-tuple-source
>>                                      -- EMPTY_TUPLE_SOURCE
>> 
>> ---------------------------------------------------------------------------------
>> 
>> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
>> join?
>> We can close this issue if this is an intended design.
>> 
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message