asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Date Tue, 10 Nov 2015 23:25:38 GMT
No problem. Let me re-open it. 

> On Nov 10, 2015, at 3:20 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> 
> Ah, yes!
> So this should be a bug then...
> 
> Best,
> Yingyi
> 
> 
> On Tue, Nov 10, 2015 at 3:15 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
> wrote:
> 
>> Actually, I’m still confused with the “cardinality” here. Isn’t the
>> cardinality of $ps is 5?
>>>> let $ps := ["b","a", "b","c","c”]
>> 
>> 
>>> On Nov 10, 2015, at 2:50 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
>>> 
>>> Jianfeng,
>>> 
>>> The results of the query is correct.
>>> The cardinality of returned results should be the same as the number of
>>> input binding tuples for $p.
>>> 
>>> Best,
>>> Yingyi
>>> 
>>> 
>>> On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <jira@apache.org>
>>> wrote:
>>> 
>>>> Jianfeng Jia created ASTERIXDB-1168:
>>>> ---------------------------------------
>>>> 
>>>>            Summary: Should not sort&group after an OrderedList
>> left-join
>>>> with a dataset
>>>>                Key: ASTERIXDB-1168
>>>>                URL:
>> https://issues.apache.org/jira/browse/ASTERIXDB-1168
>>>>            Project: Apache AsterixDB
>>>>         Issue Type: Bug
>>>>         Components: Optimizer
>>>>           Reporter: Jianfeng Jia
>>>> 
>>>> 
>>>> Hi,
>>>> Here is the context for this issue,  I wanted to lookup some records in
>>>> the DB through REST API, and I wanted to lookup in a batch way. Then I
>>>> packaged the "keys" into an OrderdList and expected a left-out join
>> would
>>>> give me all matching records that consistent with query order. However,
>> the
>>>> result was re-sorted and grouped, which confused the client side
>> response
>>>> handler.
>>>> 
>>>> Here is the synthetic query that emulates the similar use case:
>>>> 
>> ---------------------------------------------------------------------------
>>>> drop dataverse test if exists;
>>>> create dataverse test;
>>>> 
>>>> use dataverse test;
>>>> 
>>>> create type TType as closed {
>>>> id: int64,
>>>> content: string
>>>> }
>>>> 
>>>> create dataset TData (TType) primary key id;
>>>> 
>>>> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2,
>> "content":
>>>> "b"}, {"id":3, "content":"c"}])
>>>> 
>>>> // now let's query on
>>>> let $ps := ["b","a", "b","c","c"]
>>>> 
>>>> for $p in $ps
>>>> return { "p":$p,
>>>> "match": for $x in dataset TData where $x.content = $p return $x.id
>>>> }
>>>> 
>> ---------------------------------------------------------------------------
>>>> 
>>>> What I expected is following:
>>>> 
>> ---------------------------------------------------------------------------
>>>> [ { "p": "b", "match": [ 2 ] }
>>>> , { "p": "a", "match": [ 1 ] }
>>>> , { "p": "b", "match": [ 2 ] }
>>>> , { "p": "c", "match": [ 3 ] }
>>>> , { "p": "c", "match": [ 3 ] }
>>>> ]
>>>> 
>> ---------------------------------------------------------------------------
>>>> 
>>>> The returned result is following, which is aggregated and re-sorted.
>>>> 
>> ---------------------------------------------------------------------------
>>>> [ { "p": "a", "match": [ 1 ] }
>>>> , { "p": "b", "match": [ 2, 2 ] }
>>>> , { "p": "c", "match": [ 3, 3 ] }
>>>> ]
>>>> 
>> ---------------------------------------------------------------------------
>>>> 
>>>> The optimized logical plan is following:
>>>> 
>> ---------------------------------------------------------------------------
>>>> distribute result [%0->$$4]
>>>> -- DISTRIBUTE_RESULT  |PARTITIONED|
>>>> exchange
>>>> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>   project ([$$4])
>>>>   -- STREAM_PROJECT  |PARTITIONED|
>>>>     assign [$$4] <- [function-call: asterix:closed-record-constructor,
>>>> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
>>>>     -- ASSIGN  |PARTITIONED|
>>>>       project ([$$1, $$9])
>>>>       -- STREAM_PROJECT  |PARTITIONED|
>>>>         exchange
>>>>         -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>           group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([])
{
>>>>                     aggregate [$$9] <- [function-call: asterix:listify,
>>>> Args:[%0->$$10]]
>>>>                     -- AGGREGATE  |LOCAL|
>>>>                       select (function-call: algebricks:not,
>>>> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
>>>>                       -- STREAM_SELECT  |LOCAL|
>>>>                         nested tuple source
>>>>                         -- NESTED_TUPLE_SOURCE  |LOCAL|
>>>>                  }
>>>>           -- PRE_CLUSTERED_GROUP_BY[$$12, $$13]  |PARTITIONED|
>>>>             exchange
>>>>             -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>               order (ASC, %0->$$12) (ASC, %0->$$13)
>>>>               -- STABLE_SORT [$$12(ASC), $$13(ASC)]  |PARTITIONED|
>>>>                 exchange
>>>>                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>                   project ([$$10, $$11, $$12, $$13])
>>>>                   -- STREAM_PROJECT  |PARTITIONED|
>>>>                     exchange
>>>>                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>                       left outer join (function-call: algebricks:eq,
>>>> Args:[%0->$$14, %0->$$13])
>>>>                       -- HYBRID_HASH_JOIN [$$13][$$14]  |PARTITIONED|
>>>>                         exchange
>>>>                         -- HASH_PARTITION_EXCHANGE [$$13]
>> |PARTITIONED|
>>>>                           unnest $$13 <- function-call:
>>>> asterix:scan-collection, Args:[%0->$$12]
>>>>                           -- UNNEST  |UNPARTITIONED|
>>>>                             assign [$$12] <- [AOrderedList: [ AString:
>>>> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
>>>>                             -- ASSIGN  |UNPARTITIONED|
>>>>                               empty-tuple-source
>>>>                               -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
>>>>                         exchange
>>>>                         -- HASH_PARTITION_EXCHANGE [$$14]
>> |PARTITIONED|
>>>>                           project ([$$10, $$11, $$14])
>>>>                           -- STREAM_PROJECT  |PARTITIONED|
>>>>                             assign [$$11, $$14] <- [TRUE,
>> function-call:
>>>> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
>>>>                             -- ASSIGN  |PARTITIONED|
>>>>                               exchange
>>>>                               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>                                 data-scan []<-[$$10, $$2] <- test:TData
>>>>                                 -- DATASOURCE_SCAN  |PARTITIONED|
>>>>                                   exchange
>>>>                                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>>>>                                     empty-tuple-source
>>>>                                     -- EMPTY_TUPLE_SOURCE
>>>> 
>>>> 
>> ---------------------------------------------------------------------------------
>>>> 
>>>> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
>>>> join?
>>>> We can close this issue if this is an intended design.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>> 
>> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message