asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng Jia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Date Wed, 11 Nov 2015 01:14:10 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999744#comment-14999744
] 

Jianfeng Jia commented on ASTERIXDB-1168:
-----------------------------------------

The above left out join generates following result:

[ { "p": "a", "match": 1 }
, { "p": "b", "match": 2 }
, { "p": "c", "match": 3 }
 ]

It was aggregated as what it did for the very original my left out join query. 

The join result in your first comment will generate the following result:

[ { "p": "b", "match": 2 }
, { "p": "a", "match": 1 }
, { "p": "b", "match": 2 }
, { "p": "c", "match": 3 }
, { "p": "c", "match": 3 }
 ]

It keeps the order as the input sequence so that I can directly use the result without reordering.
The confusing part is that why we need to sort + groupby for left out join, and don't do it
to the equal join? 

> Should not sort&group after an OrderedList left-join with a dataset
> -------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1168
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Optimizer
>            Reporter: Jianfeng Jia
>
> Hi,
> Here is the context for this issue,  I wanted to lookup some records in the DB through
REST API, and I wanted to lookup in a batch way. Then I packaged the "keys" into an OrderdList
and expected a left-out join would give me all matching records that consistent with query
order. However, the result was re-sorted and grouped, which confused the client side response
handler. 
> Here is the synthetic query that emulates the similar use case:
> ---------------------------------------------------------------------------
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type TType as closed {
>   id: int64,
>   content: string
> }
> create dataset TData (TType) primary key id;
> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": "b"}, {"id":3,
"content":"c"}])
> // now let's query on
> let $ps := ["b","a", "b","c","c"]
> for $p in $ps
> return { "p":$p,
> "match": for $x in dataset TData where $x.content = $p return $x.id
> }
> ---------------------------------------------------------------------------
> What I expected is following:
> ---------------------------------------------------------------------------
> [ { "p": "b", "match": [ 2 ] }
> , { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2 ] }
> , { "p": "c", "match": [ 3 ] }
> , { "p": "c", "match": [ 3 ] }
>  ]
> ---------------------------------------------------------------------------
> The returned result is following, which is aggregated and re-sorted.
> ---------------------------------------------------------------------------
> [ { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2, 2 ] }
> , { "p": "c", "match": [ 3, 3 ] }
>  ]
> ---------------------------------------------------------------------------
> The optimized logical plan is following:
> ---------------------------------------------------------------------------
> distribute result [%0->$$4]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$4])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$4] <- [function-call: asterix:closed-record-constructor, Args:[AString:
{p}, %0->$$1, AString: {match}, %0->$$9]]
>       -- ASSIGN  |PARTITIONED|
>         project ([$$1, $$9])
>         -- STREAM_PROJECT  |PARTITIONED|
>           exchange 
>           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>             group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>                       aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$10]]
>                       -- AGGREGATE  |LOCAL|
>                         select (function-call: algebricks:not, Args:[function-call: algebricks:is-null,
Args:[%0->$$11]])
>                         -- STREAM_SELECT  |LOCAL|
>                           nested tuple source
>                           -- NESTED_TUPLE_SOURCE  |LOCAL|
>                    }
>             -- PRE_CLUSTERED_GROUP_BY[$$12, $$13]  |PARTITIONED|
>               exchange 
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 order (ASC, %0->$$12) (ASC, %0->$$13) 
>                 -- STABLE_SORT [$$12(ASC), $$13(ASC)]  |PARTITIONED|
>                   exchange 
>                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                     project ([$$10, $$11, $$12, $$13])
>                     -- STREAM_PROJECT  |PARTITIONED|
>                       exchange 
>                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                         left outer join (function-call: algebricks:eq, Args:[%0->$$14,
%0->$$13])
>                         -- HYBRID_HASH_JOIN [$$13][$$14]  |PARTITIONED|
>                           exchange 
>                           -- HASH_PARTITION_EXCHANGE [$$13]  |PARTITIONED|
>                             unnest $$13 <- function-call: asterix:scan-collection,
Args:[%0->$$12]
>                             -- UNNEST  |UNPARTITIONED|
>                               assign [$$12] <- [AOrderedList: [ AString: {b}, AString:
{a}, AString: {b}, AString: {c}, AString: {c} ]]
>                               -- ASSIGN  |UNPARTITIONED|
>                                 empty-tuple-source
>                                 -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
>                           exchange 
>                           -- HASH_PARTITION_EXCHANGE [$$14]  |PARTITIONED|
>                             project ([$$10, $$11, $$14])
>                             -- STREAM_PROJECT  |PARTITIONED|
>                               assign [$$11, $$14] <- [TRUE, function-call: asterix:field-access-by-index,
Args:[%0->$$2, AInt32: {1}]]
>                               -- ASSIGN  |PARTITIONED|
>                                 exchange 
>                                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                   data-scan []<-[$$10, $$2] <- test:TData
>                                   -- DATASOURCE_SCAN  |PARTITIONED|
>                                     exchange 
>                                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                       empty-tuple-source
>                                       -- EMPTY_TUPLE_SOURCE 
> ---------------------------------------------------------------------------------
> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out join? 
> We can close this issue if this is an intended design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message