phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerald Sangudi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5353) Incorrect results when JOINs project array elements
Date Tue, 18 Jun 2019 17:33:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gerald Sangudi updated PHOENIX-5353:
------------------------------------
    Description: 
There are two errors observed below, one with HASH JOIN and the other with SORT MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when using sort merge
join.
 I also noticed there is an issue with hash-joins (see the last example).

Create tables and upsert test data.
 0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, c2 integer,
c3 integer[]);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, array[1,2,3]);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, c2 integer);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
 Verify data in the tables
 0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
 +------+----++------------
|C1|C2|C3|

+------+----++------------
|1|1|[1, 2, 3]|
|2|10|[2, 3, 4]|

+------+----++------------
 2 rows selected (0.027 seconds)
 0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
 +------+----+
|C1|C2|

+------+----+
|1|10|
|2|20|

+------+----+
 2 rows selected (0.021 seconds)


 Perform sort merge join without projecting array elements. This works fine.
 0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, d.c2, d.c3,
t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
 +--------+------++--------------------++--------
|D.C1|D.C2|D.C3|T2.C1|T2.C2|

+--------+------++--------------------++--------
|1|1|[1, 2, 3]|1|10|
|2|10|[2, 3, 4]|2|20|

+--------+------++--------------------++--------
 2 rows selected (0.054 seconds)


 Perform sort merge join by projecting array elements. Returns junk data for array elements.
 0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, d.c2, d.c3[1],
d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
 +--------+------++--------------------------------------------++------------------------------++--------
|D.C1|D.C2|ARRAY_ELEM(D.C3, 1)|ARRAY_ELEM(D.C3, 2)|ARRAY_ELEM(D.C3, 3)|T2.C1|T2.C2|

+--------+------++--------------------------------------------++------------------------------++--------
|1|1|-1937768448|-2122317824|-2105540608|1|10|
|2|10|-1937768448|-2105540608|-2088763392|2|20|

+--------+------++--------------------------------------------++------------------------------++--------
 2 rows selected (0.043 seconds)


 Array element projection works fine when using hash-join but columns from the non-array table
are messed up.
 0: jdbc:phoenix:localhost:2181:/hbase> select d.c1, d.c2, d.c3[1], d.c3[2], d.c3[3], t2.c1,
t2.c2 from d join t2 on d.c1 = t2.c1;
 +--------+------++--------------------------------------------++------------------------------------++--------------
|D.C1|D.C2|ARRAY_ELEM(D.C3, 1)|ARRAY_ELEM(D.C3, 2)|ARRAY_ELEM(D.C3, 3)|T2.C1|T2.C2|

+--------+------++--------------------------------------------++------------------------------------++--------------
|1|1|1|2|3|-2146795520|-2147319808|
|2|10|2|3|4|-2146140160|-2147319808|

+--------+------++--------------------------------------------++------------------------------------++--------------
 2 rows selected (0.067 seconds)

  was:
There are two errors observed below, one with HASH JOIN and the other with SORT MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when using sort merge
join.
I also noticed there is an issue with hash-joins (see the last example).

* Create tables and upsert test data.
0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, c2 integer,
c3 integer[]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, array[1,2,3]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, c2 integer);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
Verify data in the tables
0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
+-----+-----+------------+
| C1  | C2  |     C3     |
+-----+-----+------------+
| 1   | 1   | [1, 2, 3]  |
| 2   | 10  | [2, 3, 4]  |
+-----+-----+------------+
2 rows selected (0.027 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
+-----+-----+
| C1  | C2  |
+-----+-----+
| 1   | 10  |
| 2   | 20  |
+-----+-----+
2 rows selected (0.021 seconds)
Perform sort merge join without projecting array elements. This works fine.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, d.c2, d.c3,
t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+-------+-------+------------+--------+--------+
| D.C1  | D.C2  |    D.C3    | T2.C1  | T2.C2  |
+-------+-------+------------+--------+--------+
| 1     | 1     | [1, 2, 3]  | 1      | 10     |
| 2     | 10    | [2, 3, 4]  | 2      | 20     |
+-------+-------+------------+--------+--------+
2 rows selected (0.054 seconds)
Perform sort merge join by projecting array elements. Returns junk data for array elements.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, d.c2, d.c3[1],
d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+-------+-------+----------------------+----------------------+----------------------+--------+--------+
| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | ARRAY_ELEM(D.C3, 3)  | T2.C1
 | T2.C2  |
+-------+-------+----------------------+----------------------+----------------------+--------+--------+
| 1     | 1     | -1937768448          | -2122317824          | -2105540608          | 1 
    | 10     |
| 2     | 10    | -1937768448          | -2105540608          | -2088763392          | 2 
    | 20     |
+-------+-------+----------------------+----------------------+----------------------+--------+--------+
2 rows selected (0.043 seconds)
Array element projection works fine when using hash-join but columns from the non-array table
are messed up.
0: jdbc:phoenix:localhost:2181:/hbase> select  d.c1, d.c2, d.c3[1], d.c3[2], d.c3[3], t2.c1,
t2.c2 from d join t2 on d.c1 = t2.c1;
+-------+-------+----------------------+----------------------+----------------------+--------------+--------------+
| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | ARRAY_ELEM(D.C3, 3)  |   
T2.C1     |    T2.C2     |
+-------+-------+----------------------+----------------------+----------------------+--------------+--------------+
| 1     | 1     | 1                    | 2                    | 3                    | -2146795520
 | -2147319808  |
| 2     | 10    | 2                    | 3                    | 4                    | -2146140160
 | -2147319808  |
+-------+-------+----------------------+----------------------+----------------------+--------------+--------------+
2 rows selected (0.067 seconds)



> Incorrect results when JOINs project array elements
> ---------------------------------------------------
>
>                 Key: PHOENIX-5353
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5353
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Gerald Sangudi
>            Priority: Major
>
> There are two errors observed below, one with HASH JOIN and the other with SORT MERGE
JOIN.
> Here are the steps for reproducing the bug in array element projection when using sort
merge join.
>  I also noticed there is an issue with hash-joins (see the last example).
> Create tables and upsert test data.
>  0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, c2
integer, c3 integer[]);
>  0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, array[1,2,3]);
>  0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, array[2,3,4]);
> 0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, c2
integer);
>  0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
>  0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
>  Verify data in the tables
>  0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
>  +------+----++------------
> |C1|C2|C3|
> +------+----++------------
> |1|1|[1, 2, 3]|
> |2|10|[2, 3, 4]|
> +------+----++------------
>  2 rows selected (0.027 seconds)
>  0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
>  +------+----+
> |C1|C2|
> +------+----+
> |1|10|
> |2|20|
> +------+----+
>  2 rows selected (0.021 seconds)
>  Perform sort merge join without projecting array elements. This works fine.
>  0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, d.c2,
d.c3, t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
>  +--------+------++--------------------++--------
> |D.C1|D.C2|D.C3|T2.C1|T2.C2|
> +--------+------++--------------------++--------
> |1|1|[1, 2, 3]|1|10|
> |2|10|[2, 3, 4]|2|20|
> +--------+------++--------------------++--------
>  2 rows selected (0.054 seconds)
>  Perform sort merge join by projecting array elements. Returns junk data for array elements.
>  0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, d.c2,
d.c3[1], d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
>  +--------+------++--------------------------------------------++------------------------------++--------
> |D.C1|D.C2|ARRAY_ELEM(D.C3, 1)|ARRAY_ELEM(D.C3, 2)|ARRAY_ELEM(D.C3, 3)|T2.C1|T2.C2|
> +--------+------++--------------------------------------------++------------------------------++--------
> |1|1|-1937768448|-2122317824|-2105540608|1|10|
> |2|10|-1937768448|-2105540608|-2088763392|2|20|
> +--------+------++--------------------------------------------++------------------------------++--------
>  2 rows selected (0.043 seconds)
>  Array element projection works fine when using hash-join but columns from the non-array
table are messed up.
>  0: jdbc:phoenix:localhost:2181:/hbase> select d.c1, d.c2, d.c3[1], d.c3[2], d.c3[3],
t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
>  +--------+------++--------------------------------------------++------------------------------------++--------------
> |D.C1|D.C2|ARRAY_ELEM(D.C3, 1)|ARRAY_ELEM(D.C3, 2)|ARRAY_ELEM(D.C3, 3)|T2.C1|T2.C2|
> +--------+------++--------------------------------------------++------------------------------------++--------------
> |1|1|1|2|3|-2146795520|-2147319808|
> |2|10|2|3|4|-2146140160|-2147319808|
> +--------+------++--------------------------------------------++------------------------------------++--------------
>  2 rows selected (0.067 seconds)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message