hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruilong Huo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-830) Wrong result in CTE query due to CTE is treated as init plan by planner and evaluated multiple times
Date Fri, 17 Jun 2016 02:51:05 GMT
Ruilong Huo created HAWQ-830:
--------------------------------

             Summary: Wrong result in CTE query due to CTE is treated as init plan by planner
and evaluated multiple times
                 Key: HAWQ-830
                 URL: https://issues.apache.org/jira/browse/HAWQ-830
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Optimizer
            Reporter: Ruilong Huo
            Assignee: Amr El-Helw


In CTE query, if the CTE itself is referenced multiple times, it should be evaluated only
once and then be used multiple time. However, it is treated as init plan and evaluated multiple
times in hawq 1.x and 2.0. This has two issues here:

1. If the query in CTE is "volatile" (i.e., select volatile function) or has side effect (create/drop
object in database), it may generate wrong result

2. The performance of the query is not so efficient since the query in CTE is evaluated multiple
times.

Here is the steps to reproduce:
1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the result is incorrect
{noformat}
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
      random       |      random
-------------------+-------------------
 0.519145511090755 | 0.751198637764901
(1 row)

EXPLAIN
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
                          QUERY PLAN
--------------------------------------------------------------
 Nested Loop  (cost=0.04..0.77 rows=20 width=16)
   ->  Result  (cost=0.01..0.02 rows=1 width=0)
         InitPlan
           ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  Materialize  (cost=0.03..0.09 rows=6 width=8)
         ->  Result  (cost=0.01..0.02 rows=1 width=0)
               InitPlan
                 ->  Result  (cost=0.00..0.01 rows=1 width=0)
 Settings:  default_hash_table_bucket_number=6
 Optimizer status: legacy query optimizer
(10 rows)
{noformat}
2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the result is i
{noformat}
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
      random       |      random
-------------------+-------------------
 0.989214501809329 | 0.989214501809329
(1 row)

EXPLAIN
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
                        QUERY PLAN
----------------------------------------------------------
 Nested Loop  (cost=0.01..0.06 rows=1 width=16)
   CTE r
     ->  Result  (cost=0.00..0.01 rows=1 width=0)
   ->  CTE Scan on r r1  (cost=0.00..0.02 rows=1 width=8)
   ->  CTE Scan on r r2  (cost=0.00..0.02 rows=1 width=8)
(5 rows){noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message