asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <>
Subject [Discuss] Inlining assign operator
Date Fri, 08 Dec 2017 06:30:13 GMT
Hi Devs,

I've been in the Algebricks vicinity lately and I think there are few
things we can do to reduce the plan size and probably the execution time. I
will file a JIRA issue for other things I noticed.

First I want to discuss the current use of the Assign operator as I need it
for my current work.

Let's see an example:
*-- Query:*

SELECT t.text as text, as city
FROM Tweets as t
WHERE t.retweet_count > 10
AND spatial_intersect (t.geo.coordinates.coordinates,
    create_rectangle(create_point(-107.27, 33.06), create_point(-89.1,

*-- Plan:*

distribute result [$$19]
    project ([$$19])
      assign [$$19] <- [{"text": $$t.getField("text"), "city":
        project ([$$t, $$25])
          select (and(gt($$t.getField("retweet_count"), 10),
spatial-intersect($$27.getField("coordinates"), rectangle: { p1: point: {
x: -107.27, y: 33.06 }, p2: point: { x: -89.1, y: 38.9 }})))
            assign [$$27, $$25] <-
[$$t.getField("geo").getField("coordinates"), $$t.getField("place")]
            -- ASSIGN  |PARTITIONED|
              project ([$$t])
                  data-scan []<-[$$20, $$t] <- TwitterDataverse.Tweets
                  -- DATASOURCE_SCAN  |PARTITIONED|
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

*-- Observation:*

- In this example, *assign [$$27, $$25]* evaluates*
$$t.getField("geo").getField("coordinates")* ($$27) even though it might
not to be used (short-circuited in the AND).
- Similarly, because *assign [$$27, $$25] *evaluates *$t.getField("place")*
($$25) much earlier, the size of project ([$$t, $$25]) is greater than
project ([$$t]). Given that $$25 can be evaluated from $$t.
- We can see that Assign does not do anything good in this case and
probably should be removed.

There are two policies but not sure which one is better:
1- Aggressively push down field access to fit more tuples/frame, but might
do unnecessary evaluation as in the example above.
2- Push down SELECT and only evaluate common expression with the SELECT and
then do field access afterwords. But might have less tuples/frame.

1- Assign that only been used once should be inlined (inline if the upper
operator can do scalar evaluation such as select/assign). **Some plans have
two consecutives assigns.

I'm leaning toward (2) for the reason that IScalarEvaluators are chained
and works per tuple basis (almost an iterator-model in a frame) and can be
more expensive in terms of function calls.

Any suggestions?

Wail Alkowaileet

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message