Return-Path: X-Original-To: apmail-asterixdb-notifications-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8296A1867F for ; Tue, 10 Nov 2015 23:49:21 +0000 (UTC) Received: (qmail 48303 invoked by uid 500); 10 Nov 2015 23:49:21 -0000 Delivered-To: apmail-asterixdb-notifications-archive@asterixdb.apache.org Received: (qmail 48270 invoked by uid 500); 10 Nov 2015 23:49:21 -0000 Mailing-List: contact notifications-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list notifications@asterixdb.incubator.apache.org Received: (qmail 48261 invoked by uid 99); 10 Nov 2015 23:49:21 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2015 23:49:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 02BA4180440 for ; Tue, 10 Nov 2015 23:49:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.771 X-Spam-Level: * X-Spam-Status: No, score=1.771 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id wCpxcTrVZy6c for ; Tue, 10 Nov 2015 23:49:12 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with SMTP id CA98520314 for ; Tue, 10 Nov 2015 23:49:11 +0000 (UTC) Received: (qmail 48195 invoked by uid 99); 10 Nov 2015 23:49:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2015 23:49:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E5C9E2C14F8 for ; Tue, 10 Nov 2015 23:49:10 +0000 (UTC) Date: Tue, 10 Nov 2015 23:49:10 +0000 (UTC) From: "Jianfeng Jia (JIRA)" To: notifications@asterixdb.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ASTERIXDB-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999611#comment-14999611 ] Jianfeng Jia edited comment on ASTERIXDB-1168 at 11/10/15 11:49 PM: -------------------------------------------------------------------- [~tillw] Using join instead of left-out-join did solve my query problem partially, because now I can make a hashmap in the client to filter out the not returned keys. It will be very nice that the left outer join (which also return nulls) can also keep the input order. Right now, the returned result seems adding a semantic "uniq" on the $ps. If I run ---------------------------------------- let $ps := ["b","a", "b","c","c"] for $p in $ps return $p ---------------------------------------- It should (and does) return [ "b" , "a" , "b" , "c" , "c" ] When I want each of them to lookup a DB record as the original query, it should still have the same cardinality, like: [ {"b", x} , {"a", x} , {"b", x} , {"c", x} , {"c", x} ] instead of [ "a": [x] , "b": [x,x] , "c": [x,x] ] was (Author: javierjia): [~tillw] Using join instead of left-out-join did solve my query problem partially, because now I can make a hashmap in the client to filter out the not returned keys. It will be very nice that the left outer join (which also return nulls) can also keep the input order. Right now, the returned result seems adding a semantic "uniq" on the $ps. If I run ---------------------------------------- let $ps := ["b","a", "b","c","c"] for $p in $ps return $p ---------------------------------------- It should (and does) return [ "b" , "a" , "b" , "c" , "c" ] When I want each of them to lookup a DB record as the original query, it should still have the same cardinality, like: [ {"b", x} , {"a", x} , {"b", x} , {"c", x} , {"c", x} ] > Should not sort&group after an OrderedList left-join with a dataset > ------------------------------------------------------------------- > > Key: ASTERIXDB-1168 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1168 > Project: Apache AsterixDB > Issue Type: Bug > Components: Optimizer > Reporter: Jianfeng Jia > > Hi, > Here is the context for this issue, I wanted to lookup some records in the DB through REST API, and I wanted to lookup in a batch way. Then I packaged the "keys" into an OrderdList and expected a left-out join would give me all matching records that consistent with query order. However, the result was re-sorted and grouped, which confused the client side response handler. > Here is the synthetic query that emulates the similar use case: > --------------------------------------------------------------------------- > drop dataverse test if exists; > create dataverse test; > use dataverse test; > create type TType as closed { > id: int64, > content: string > } > create dataset TData (TType) primary key id; > insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2, "content": "b"}, {"id":3, "content":"c"}]) > // now let's query on > let $ps := ["b","a", "b","c","c"] > for $p in $ps > return { "p":$p, > "match": for $x in dataset TData where $x.content = $p return $x.id > } > --------------------------------------------------------------------------- > What I expected is following: > --------------------------------------------------------------------------- > [ { "p": "b", "match": [ 2 ] } > , { "p": "a", "match": [ 1 ] } > , { "p": "b", "match": [ 2 ] } > , { "p": "c", "match": [ 3 ] } > , { "p": "c", "match": [ 3 ] } > ] > --------------------------------------------------------------------------- > The returned result is following, which is aggregated and re-sorted. > --------------------------------------------------------------------------- > [ { "p": "a", "match": [ 1 ] } > , { "p": "b", "match": [ 2, 2 ] } > , { "p": "c", "match": [ 3, 3 ] } > ] > --------------------------------------------------------------------------- > The optimized logical plan is following: > --------------------------------------------------------------------------- > distribute result [%0->$$4] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$4]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$4] <- [function-call: asterix:closed-record-constructor, Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]] > -- ASSIGN |PARTITIONED| > project ([$$1, $$9]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) { > aggregate [$$9] <- [function-call: asterix:listify, Args:[%0->$$10]] > -- AGGREGATE |LOCAL| > select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$11]]) > -- STREAM_SELECT |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| > } > -- PRE_CLUSTERED_GROUP_BY[$$12, $$13] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > order (ASC, %0->$$12) (ASC, %0->$$13) > -- STABLE_SORT [$$12(ASC), $$13(ASC)] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$10, $$11, $$12, $$13]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > left outer join (function-call: algebricks:eq, Args:[%0->$$14, %0->$$13]) > -- HYBRID_HASH_JOIN [$$13][$$14] |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$$13] |PARTITIONED| > unnest $$13 <- function-call: asterix:scan-collection, Args:[%0->$$12] > -- UNNEST |UNPARTITIONED| > assign [$$12] <- [AOrderedList: [ AString: {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]] > -- ASSIGN |UNPARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$$14] |PARTITIONED| > project ([$$10, $$11, $$14]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$11, $$14] <- [TRUE, function-call: asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]] > -- ASSIGN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > data-scan []<-[$$10, $$2] <- test:TData > -- DATASOURCE_SCAN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE > --------------------------------------------------------------------------------- > Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out join? > We can close this issue if this is an intended design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)