asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yingyi Bu <buyin...@gmail.com>
Subject Re: [jira] [Created] (ASTERIXDB-1168) Should not sort&group after an OrderedList left-join with a dataset
Date Tue, 10 Nov 2015 23:20:16 GMT
Ah, yes!
So this should be a bug then...

Best,
Yingyi


On Tue, Nov 10, 2015 at 3:15 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
wrote:

> Actually, I’m still confused with the “cardinality” here. Isn’t the
> cardinality of $ps is 5?
> >> let $ps := ["b","a", "b","c","c”]
>
>
> > On Nov 10, 2015, at 2:50 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> >
> > Jianfeng,
> >
> > The results of the query is correct.
> > The cardinality of returned results should be the same as the number of
> > input binding tuples for $p.
> >
> > Best,
> > Yingyi
> >
> >
> > On Tue, Nov 10, 2015 at 2:34 PM, Jianfeng Jia (JIRA) <jira@apache.org>
> > wrote:
> >
> >> Jianfeng Jia created ASTERIXDB-1168:
> >> ---------------------------------------
> >>
> >>             Summary: Should not sort&group after an OrderedList
> left-join
> >> with a dataset
> >>                 Key: ASTERIXDB-1168
> >>                 URL:
> https://issues.apache.org/jira/browse/ASTERIXDB-1168
> >>             Project: Apache AsterixDB
> >>          Issue Type: Bug
> >>          Components: Optimizer
> >>            Reporter: Jianfeng Jia
> >>
> >>
> >> Hi,
> >> Here is the context for this issue,  I wanted to lookup some records in
> >> the DB through REST API, and I wanted to lookup in a batch way. Then I
> >> packaged the "keys" into an OrderdList and expected a left-out join
> would
> >> give me all matching records that consistent with query order. However,
> the
> >> result was re-sorted and grouped, which confused the client side
> response
> >> handler.
> >>
> >> Here is the synthetic query that emulates the similar use case:
> >>
> ---------------------------------------------------------------------------
> >> drop dataverse test if exists;
> >> create dataverse test;
> >>
> >> use dataverse test;
> >>
> >> create type TType as closed {
> >>  id: int64,
> >>  content: string
> >> }
> >>
> >> create dataset TData (TType) primary key id;
> >>
> >> insert into dataset TData ( [ {"id":1, "content":"a"}, {"id":2,
> "content":
> >> "b"}, {"id":3, "content":"c"}])
> >>
> >> // now let's query on
> >> let $ps := ["b","a", "b","c","c"]
> >>
> >> for $p in $ps
> >> return { "p":$p,
> >> "match": for $x in dataset TData where $x.content = $p return $x.id
> >> }
> >>
> ---------------------------------------------------------------------------
> >>
> >> What I expected is following:
> >>
> ---------------------------------------------------------------------------
> >> [ { "p": "b", "match": [ 2 ] }
> >> , { "p": "a", "match": [ 1 ] }
> >> , { "p": "b", "match": [ 2 ] }
> >> , { "p": "c", "match": [ 3 ] }
> >> , { "p": "c", "match": [ 3 ] }
> >> ]
> >>
> ---------------------------------------------------------------------------
> >>
> >> The returned result is following, which is aggregated and re-sorted.
> >>
> ---------------------------------------------------------------------------
> >> [ { "p": "a", "match": [ 1 ] }
> >> , { "p": "b", "match": [ 2, 2 ] }
> >> , { "p": "c", "match": [ 3, 3 ] }
> >> ]
> >>
> ---------------------------------------------------------------------------
> >>
> >> The optimized logical plan is following:
> >>
> ---------------------------------------------------------------------------
> >> distribute result [%0->$$4]
> >> -- DISTRIBUTE_RESULT  |PARTITIONED|
> >>  exchange
> >>  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>    project ([$$4])
> >>    -- STREAM_PROJECT  |PARTITIONED|
> >>      assign [$$4] <- [function-call: asterix:closed-record-constructor,
> >> Args:[AString: {p}, %0->$$1, AString: {match}, %0->$$9]]
> >>      -- ASSIGN  |PARTITIONED|
> >>        project ([$$1, $$9])
> >>        -- STREAM_PROJECT  |PARTITIONED|
> >>          exchange
> >>          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>            group by ([$$0 := %0->$$12; $$1 := %0->$$13]) decor ([]) {
> >>                      aggregate [$$9] <- [function-call: asterix:listify,
> >> Args:[%0->$$10]]
> >>                      -- AGGREGATE  |LOCAL|
> >>                        select (function-call: algebricks:not,
> >> Args:[function-call: algebricks:is-null, Args:[%0->$$11]])
> >>                        -- STREAM_SELECT  |LOCAL|
> >>                          nested tuple source
> >>                          -- NESTED_TUPLE_SOURCE  |LOCAL|
> >>                   }
> >>            -- PRE_CLUSTERED_GROUP_BY[$$12, $$13]  |PARTITIONED|
> >>              exchange
> >>              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>                order (ASC, %0->$$12) (ASC, %0->$$13)
> >>                -- STABLE_SORT [$$12(ASC), $$13(ASC)]  |PARTITIONED|
> >>                  exchange
> >>                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>                    project ([$$10, $$11, $$12, $$13])
> >>                    -- STREAM_PROJECT  |PARTITIONED|
> >>                      exchange
> >>                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>                        left outer join (function-call: algebricks:eq,
> >> Args:[%0->$$14, %0->$$13])
> >>                        -- HYBRID_HASH_JOIN [$$13][$$14]  |PARTITIONED|
> >>                          exchange
> >>                          -- HASH_PARTITION_EXCHANGE [$$13]
> |PARTITIONED|
> >>                            unnest $$13 <- function-call:
> >> asterix:scan-collection, Args:[%0->$$12]
> >>                            -- UNNEST  |UNPARTITIONED|
> >>                              assign [$$12] <- [AOrderedList: [ AString:
> >> {b}, AString: {a}, AString: {b}, AString: {c}, AString: {c} ]]
> >>                              -- ASSIGN  |UNPARTITIONED|
> >>                                empty-tuple-source
> >>                                -- EMPTY_TUPLE_SOURCE  |UNPARTITIONED|
> >>                          exchange
> >>                          -- HASH_PARTITION_EXCHANGE [$$14]
> |PARTITIONED|
> >>                            project ([$$10, $$11, $$14])
> >>                            -- STREAM_PROJECT  |PARTITIONED|
> >>                              assign [$$11, $$14] <- [TRUE,
> function-call:
> >> asterix:field-access-by-index, Args:[%0->$$2, AInt32: {1}]]
> >>                              -- ASSIGN  |PARTITIONED|
> >>                                exchange
> >>                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>                                  data-scan []<-[$$10, $$2] <- test:TData
> >>                                  -- DATASOURCE_SCAN  |PARTITIONED|
> >>                                    exchange
> >>                                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
> >>                                      empty-tuple-source
> >>                                      -- EMPTY_TUPLE_SOURCE
> >>
> >>
> ---------------------------------------------------------------------------------
> >>
> >> Why there is an STABLE_SORT + PRE_CLUSTERED_GROUP_BY after the left out
> >> join?
> >> We can close this issue if this is an intended design.
> >>
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.3.4#6332)
> >>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message