asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng Jia (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Date Tue, 10 Nov 2015 23:49:10 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999611#comment-14999611
] 

Jianfeng Jia edited comment on ASTERIXDB-1168 at 11/10/15 11:49 PM:
--------------------------------------------------------------------

[~tillw] Using join instead of left-out-join did solve my query problem partially, because
now I can make a hashmap in the client to filter out the not returned keys. 

It will be very nice that the left outer join (which also return nulls) can also keep the
input order.
Right now, the returned result seems adding a semantic "uniq" on the $ps. 

If I run 
----------------------------------------
let $ps := ["b","a", "b","c","c"]
for $p in $ps return $p
----------------------------------------
It should (and does) return 
[ "b"
, "a"
, "b"
, "c"
, "c"
 ]

When I want each of them to lookup a DB record as the original query, it should still have
the same cardinality, like:
[  {"b", x}
, {"a", x}
, {"b", x}
, {"c", x}
, {"c", x}
 ]

instead of
[ "a": [x]
, "b": [x,x]
, "c": [x,x]
]
 


was (Author: javierjia):
[~tillw] Using join instead of left-out-join did solve my query problem partially, because
now I can make a hashmap in the client to filter out the not returned keys. 

It will be very nice that the left outer join (which also return nulls) can also keep the
input order.
Right now, the returned result seems adding a semantic "uniq" on the $ps. 

If I run 
----------------------------------------
let $ps := ["b","a", "b","c","c"]
for $p in $ps return $p
----------------------------------------
It should (and does) return 
[ "b"
, "a"
, "b"
, "c"
, "c"
 ]

When I want each of them to lookup a DB record as the original query, it should still have
the same cardinality, like:
[  {"b", x}
, {"a", x}
, {"b", x}
, {"c", x}
, {"c", x}
 ]


 

> Should not sort&group after an OrderedList left-join with a dataset
> -------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1168
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Optimizer
>            Reporter: Jianfeng Jia
>
> Hi,
> Here is the context for this issue,  I wanted to lookup some records in the DB through
REST API, and I wanted to lookup in a batch way. Then I packaged the "keys" into an OrderdList
and expected a left-out join would give me all matching records that consistent with query
order. However, the result was re-sorted and grouped, which confused the client side response
handler. 
> Here is the synthetic query that emulates the similar use case:
> ---------------------------------------------------------------------------
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type TType as closed {
>   id: int64,
>   content: string
> }
> create dataset TData (TType) primary key id;
> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": "b"}, {"id":3,
"content":"c"}])
> // now let's query on
> let $ps := ["b","a", "b","c","c"]
> for $p in $ps
> return { "p":$p,
> "match": for $x in dataset TData where $x.content = $p return $x.id
> }
> ---------------------------------------------------------------------------
> What I expected is following:
> ---------------------------------------------------------------------------
> [ { "p": "b", "match": [ 2 ] }
> , { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2 ] }
> , { "p": "c", "match": [ 3 ] }
> , { "p": "c", "match": [ 3 ] }
>  ]
> ---------------------------------------------------------------------------
> The returned result is following, which is aggregated and re-sorted.
> ---------------------------------------------------------------------------
> [ { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2, 2 ] }
> , { "p": "c", "match": [ 3, 3 ] }
>  ]
> ---------------------------------------------------------------------------
> The optimized logical plan is following:
> ---------------------------------------------------------------------------
> distribute result [%0->$$4]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$4])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$4] <- [function-call: asterix:closed-record-constructor, Args:[AString:
{p}, %0->$$1, AString: {match}, %0->$$9]]
>       -- ASSIGN  |PARTITIONED|
>         project ([$$1, $$9])
>         -- STREAM_PROJECT  |PARTITIONED|
>           exchange 
>           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>             group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>                       aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$10]]
>                       -- AGGREGATE  |LOCAL|
>                         select (function-call: algebricks:not, Args:[function-call: algebricks:is-null,
Args:[%0->$$11]])
>                         -- STREAM_SELECT  |LOCAL|
>                           nested tuple source
>                           -- NESTED_TUPLE_SOURCE  |LOCAL|
>                    }
>             -- PRE_CLUSTERED_GROUP_BY[$$12, $$13]  |PARTITIONED|
>               exchange 
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 order (ASC, %0->$$12) (ASC, %0->$$13) 
>                 -- STABLE_SORT [$$12(ASC), $$13(ASC)]  |PARTITIONED|
>                   exchange 
>                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                     project ([$$10, $$11, $$12, $$13])
>                     -- STREAM_PROJECT  |PARTITIONED|
>                       exchange 
>                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                         left outer join (function-call: algebricks:eq, Args:[%0->$$14,
%0->$$13])
>                         -- HYBRID_HASH_JOIN [$$13][$$14]  |PARTITIONED|
>                           exchange 
>                           -- HASH_PARTITION_EXCHANGE [$$13]  |PARTITIONED|
>                             unnest $$13 <- function-call: asterix:scan-collection,
Args:[%0->$$12]
>                             -- UNNEST  |UNPARTITIONED|
>                               assign [$$12] <- [AOrderedList: [ AString: {b}, AString:
{a}, AString: {b}, AString: {c}, AString: {c} ]]
>                               -- ASSIGN  |UNPARTITIONED|
>                                 empty-tuple-source
>                                 -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
>                           exchange 
>                           -- HASH_PARTITION_EXCHANGE [$$14]  |PARTITIONED|
>                             project ([$$10, $$11, $$14])
>                             -- STREAM_PROJECT  |PARTITIONED|
>                               assign [$$11, $$14] <- [TRUE, function-call: asterix:field-access-by-index,
Args:[%0->$$2, AInt32: {1}]]
>                               -- ASSIGN  |PARTITIONED|
>                                 exchange 
>                                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                   data-scan []<-[$$10, $$2] <- test:TData
>                                   -- DATASOURCE_SCAN  |PARTITIONED|
>                                     exchange 
>                                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                       empty-tuple-source
>                                       -- EMPTY_TUPLE_SOURCE 
> ---------------------------------------------------------------------------------
> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out join? 
> We can close this issue if this is an intended design. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message