hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruilong Huo (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-800) Less tuple is inserted due to data locality information is not refreshed and dispatched in prepared statement
Date Sun, 12 Jun 2016 11:03:20 GMT
Ruilong Huo created HAWQ-800:
--------------------------------

             Summary: Less tuple is inserted due to data locality information is not refreshed
and dispatched in prepared statement
                 Key: HAWQ-800
                 URL: https://issues.apache.org/jira/browse/HAWQ-800
             Project: Apache HAWQ
          Issue Type: Bug
          Components: Core, Dispatcher, Query Execution, Resource Manager
            Reporter: Ruilong Huo
            Assignee: George Caragea


In either explicit (SQL) or implicit (UDF) prepared statement, there is less tuple inserted
if we run a prepared "insert into t select * from t" plan multiple times in a transaction.

Below is a simple case to reproduce this issue. For a more complicated example, you may refer
to attached proba_execute and proba.
1. There should be 8 tuples, however there is only 4 in hawq 2.0
{noformat}
drop table if exists t;
DROP TABLE

create table t (id int);
CREATE TABLE

insert into t values (1);
INSERT 0 1

CREATE OR REPLACE FUNCTION f_load()
RETURNS TEXT
LANGUAGE plpgsql
AS
$body$
DECLARE
    l_rec RECORD;
    l_itm RECORD;
BEGIN

    FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
    LOOP
        INSERT INTO t SELECT * FROM t;
    END LOOP;

    RETURN 'done';

END;
$body$
;
CREATE FUNCTION

SELECT f_load();
 f_load
--------
 done
(1 row)

SELECT * FROM t;
 id
----
  1
  1
  1
  1
(4 rows)
{noformat}

2. There are 8 tuples as expected in hawq 1.x
{noformat}
drop table if exists t;
DROP TABLE

create table t (id int);
psql:temp.sql:3: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named
'id' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s)
chosen are the optimal data distribution key to minimize skew.
CREATE TABLE

insert into t values (1);
INSERT 0 1

CREATE OR REPLACE FUNCTION f_load()
RETURNS TEXT
LANGUAGE plpgsql
AS
$body$
DECLARE
    l_rec RECORD;
    l_itm RECORD;

BEGIN

    FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
    LOOP
        INSERT INTO t SELECT * FROM t;
    END LOOP;

    RETURN 'done';

END;
$body$
;
CREATE FUNCTION

SELECT f_load();
 f_load
--------
 done
(1 row)

SELECT * FROM t;
 id
----
  1
  1
  1
  1
  1
  1
  1
  1
(8 rows)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message