Mailing-List: contact notifications-help@asterixdb.incubator.apache.org;
 run by ezmlm
Precedence: bulk
Reply-To: dev@asterixdb.incubator.apache.org
Date: Tue, 10 Nov 2015 23:49:10 +0000 (UTC)
From: "Jianfeng Jia (JIRA)" <jira@apache.org>
To: notifications@asterixdb.incubator.apache.org
Message-ID: <JIRA.12912005.1447194807000.34193.1447199350937@Atlassian.JIRA>
In-Reply-To: <JIRA.12912005.1447194807000@Atlassian.JIRA>
References: <JIRA.12912005.1447194807000@Atlassian.JIRA>
 <JIRA.12912005.1447194807432@arcas>
Subject: [jira] [Comment Edited] (ASTERIXDB-1168) Should not sort&group
 after an OrderedList left-join with a dataset
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/ASTERIXDB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999611#comment-14999611 ] 

Jianfeng Jia edited comment on ASTERIXDB-1168 at 11/10/15 11:49 PM:
--------------------------------------------------------------------

[~tillw] Using join instead of left-out-join did solve my query problem partially, because now I can make a hashmap in the client to filter out the not returned keys. 

It will be very nice that the left outer join (which also return nulls) can also keep the input order.
Right now, the returned result seems adding a semantic "uniq" on the $ps. 

If I run 
----------------------------------------
let $ps := ["b","a", "b","c","c"]
for $p in $ps return $p
----------------------------------------
It should (and does) return 
[ "b"
, "a"
, "b"
, "c"
, "c"
 ]

When I want each of them to lookup a DB record as the original query, it should still have the same cardinality, like:
[  {"b", x}
, {"a", x}
, {"b", x}
, {"c", x}
, {"c", x}
 ]

instead of
[ "a": [x]
, "b": [x,x]
, "c": [x,x]
]
 

was (Author: javierjia):
[~tillw] Using join instead of left-out-join did solve my query problem partially, because now I can make a hashmap in the client to filter out the not returned keys. 

It will be very nice that the left outer join (which also return nulls) can also keep the input order.
Right now, the returned result seems adding a semantic "uniq" on the $ps. 

If I run 
----------------------------------------
let $ps := ["b","a", "b","c","c"]
for $p in $ps return $p
----------------------------------------
It should (and does) return 
[ "b"
, "a"
, "b"
, "c"
, "c"
 ]

When I want each of them to lookup a DB record as the original query, it should still have the same cardinality, like:
[  {"b", x}
, {"a", x}
, {"b", x}
, {"c", x}
, {"c", x}
 ]


> Should not sort&group after an OrderedList left-join with a dataset
> -------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1168
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Optimizer
>            Reporter: Jianfeng Jia
>
> Hi,
> Here is the context for this issue,  I wanted to lookup some records in the DB through REST API, and I wanted to lookup in a batch way. Then I packaged the "keys" into an OrderdList and expected a left-out join would give me all matching records that consistent with query order. However, the result was re-sorted and grouped, which confused the client side response handler. 
> Here is the synthetic query that emulates the similar use case:
> ---------------------------------------------------------------------------
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type TType as closed {
>   id: int64,
>   content: string
> }
> create dataset TData (TType) primary key id;
> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": "b"}, {"id":3, "content":"c"}])
> // now let's query on
> let $ps := ["b","a", "b","c","c"]
> for $p in $ps
> return { "p":$p,
> "match": for $x in dataset TData where $x.content = $p return $x.id
> }
> ---------------------------------------------------------------------------
> What I expected is following:
> ---------------------------------------------------------------------------
> [ { "p": "b", "match": [ 2 ] }
> , { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2 ] }
> , { "p": "c", "match": [ 3 ] }
> , { "p": "c", "match": [ 3 ] }
>  ]
> ---------------------------------------------------------------------------
> The returned result is following, which is aggregated and re-sorted.
> ---------------------------------------------------------------------------
> [ { "p": "a", "match": [ 1 ] }
> , { "p": "b", "match": [ 2, 2 ] }
> , { "p": "c", "match": [ 3, 3 ] }
>  ]
> ---------------------------------------------------------------------------
> The optimized logical plan is following:
> ---------------------------------------------------------------------------
> distribute result [%0->$$4]
> -- DISTRIBUTE_RESULT  |PARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>     project ([$$4])
>     -- STREAM_PROJECT  |PARTITIONED|
>       assign [$$4] <- [function-call: asterix:closed-record-constructor, Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
>       -- ASSIGN  |PARTITIONED|
>         project ([$$1, $$9])
>         -- STREAM_PROJECT  |PARTITIONED|
>           exchange 
>           -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>             group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
>                       aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$10]]
>                       -- AGGREGATE  |LOCAL|
>                         select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
>                         -- STREAM_SELECT  |LOCAL|
>                           nested tuple source
>                           -- NESTED_TUPLE_SOURCE  |LOCAL|
>                    }
>             -- PRE_CLUSTERED_GROUP_BY[$$12, $$13]  |PARTITIONED|
>               exchange 
>               -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                 order (ASC, %0->$$12) (ASC, %0->$$13) 
>                 -- STABLE_SORT [$$12(ASC), $$13(ASC)]  |PARTITIONED|
>                   exchange 
>                   -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                     project ([$$10, $$11, $$12, $$13])
>                     -- STREAM_PROJECT  |PARTITIONED|
>                       exchange 
>                       -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                         left outer join (function-call: algebricks:eq, Args:[%0->$$14, %0->$$13])
>                         -- HYBRID_HASH_JOIN [$$13][$$14]  |PARTITIONED|
>                           exchange 
>                           -- HASH_PARTITION_EXCHANGE [$$13]  |PARTITIONED|
>                             unnest $$13 <- function-call: asterix:scan-collection, Args:[%0->$$12]
>                             -- UNNEST  |UNPARTITIONED|
>                               assign [$$12] <- [AOrderedList: [ AString: {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
>                               -- ASSIGN  |UNPARTITIONED|
>                                 empty-tuple-source
>                                 -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
>                           exchange 
>                           -- HASH_PARTITION_EXCHANGE [$$14]  |PARTITIONED|
>                             project ([$$10, $$11, $$14])
>                             -- STREAM_PROJECT  |PARTITIONED|
>                               assign [$$11, $$14] <- [TRUE, function-call: asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
>                               -- ASSIGN  |PARTITIONED|
>                                 exchange 
>                                 -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                   data-scan []<-[$$10, $$2] <- test:TData
>                                   -- DATASOURCE_SCAN  |PARTITIONED|
>                                     exchange 
>                                     -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>                                       empty-tuple-source
>                                       -- EMPTY_TUPLE_SOURCE 
> ---------------------------------------------------------------------------------
> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out join? 
> We can close this issue if this is an intended design. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)