asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ASTERIXDB-1779) Processing the certain function predicates after a simple predicates
Date Wed, 01 Feb 2017 19:05:51 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848791#comment-15848791
] 

Taewoo Kim edited comment on ASTERIXDB-1779 at 2/1/17 7:05 PM:
---------------------------------------------------------------

Certain functions - especially text functions and spatial functions are expensive than simple
comparison functions.  So, based on this, I think we can slightly change the order of predicates.
And the interesting point is that the original orders are not preserved anyway in the current
codebase. The point is that I would like to postpone expensive function evaluations to the
end.

An example of the optimization in the master branch as of now
{code}
let $ts := datetime("2010-12-12T00:00:00Z")
let $region := create-rectangle(create-point(0.0,0.0),create-point(100.0,100.0))
let $keyword := "verizon"
for $t in dataset TweetMessages
where $t.send-time > $ts
    and spatial-intersect($t.user.sender-location, $region)
    and contains($t.message-text, $keyword)
return $t
{code}

Final Plan
{code}
distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      select (function-call: algebricks:and, Args:[function-call: asterix:spatial-intersect,
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$21, AInt32: {6}], ARectangle:
{ p1: APoint: { x: 0.0, y: 0.0 }, p2: APoint: { x: 100.0, y: 100.0 }}], function-call: asterix:contains,
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$3, AInt32: {4}], AString:
{verizon}], function-call: algebricks:gt, Args:[function-call: asterix:field-access-by-index,
Args:[%0->$$3, AInt32: {2}], ADateTime: { 2010-12-12T00:00:00.000Z }]])
      -- STREAM_SELECT  |PARTITIONED|
        assign [$$21] <- [function-call: asterix:field-access-by-index, Args:[%0->$$3,
AInt32: {1}]]
        -- ASSIGN  |PARTITIONED|
          project ([$$3])
          -- STREAM_PROJECT  |PARTITIONED|
            exchange
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              data-scan []<-[$$17, $$3] <- TinySocial:TweetMessages
              -- DATASOURCE_SCAN  |PARTITIONED|
                exchange
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  empty-tuple-source
                  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}


was (Author: wangsaeu):
Certain functions - especially text functions and spatial functions are expensive than simple
comparison functions.  So, based on this, I think we can slightly change the order of predicates.
And the interesting point is that the original orders are not preserved anyway in the current
codebase. The point is that I would like to postpone expensive function evaluations to the
end.

{code}
let $ts := datetime("2010-12-12T00:00:00Z")
let $region := create-rectangle(create-point(0.0,0.0),create-point(100.0,100.0))
let $keyword := "verizon"
for $t in dataset TweetMessages
where $t.send-time > $ts
    and spatial-intersect($t.user.sender-location, $region)
    and contains($t.message-text, $keyword)
return $t
{code}

Final Plan
{code}
distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      select (function-call: algebricks:and, Args:[function-call: asterix:spatial-intersect,
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$21, AInt32: {6}], ARectangle:
{ p1: APoint: { x: 0.0, y: 0.0 }, p2: APoint: { x: 100.0, y: 100.0 }}], function-call: asterix:contains,
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$3, AInt32: {4}], AString:
{verizon}], function-call: algebricks:gt, Args:[function-call: asterix:field-access-by-index,
Args:[%0->$$3, AInt32: {2}], ADateTime: { 2010-12-12T00:00:00.000Z }]])
      -- STREAM_SELECT  |PARTITIONED|
        assign [$$21] <- [function-call: asterix:field-access-by-index, Args:[%0->$$3,
AInt32: {1}]]
        -- ASSIGN  |PARTITIONED|
          project ([$$3])
          -- STREAM_PROJECT  |PARTITIONED|
            exchange
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              data-scan []<-[$$17, $$3] <- TinySocial:TweetMessages
              -- DATASOURCE_SCAN  |PARTITIONED|
                exchange
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  empty-tuple-source
                  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}

> Processing the certain function predicates after a simple predicates
> --------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1779
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1779
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>
> For example, if we have the following AQL query,
> {code}
> for $i in dataset MyData
>    where $i.id < 5 and edit-distance($i.name, "Arnold") < 2
>    return $i;
> {code}
> It may be better to process *$i.id < 5* predicate first and then process *edit-distance($i.name,
"Arnold")* predicate since the processing cost of the latter is higher than that of the former.
 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message