drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zelaine Fong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4684) Incorrect behavior when joining RDBMS table on itself
Date Wed, 18 May 2016 15:50:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289181#comment-15289181
] 

Zelaine Fong commented on DRILL-4684:
-------------------------------------

Not sure if this is related to DRILL-4374.  With that Jira, an error is returned as opposed
to wrong result.  Also, latest comment on the Jira indicates the problem wasn't reproducible.

> Incorrect behavior when joining RDBMS table on itself
> -----------------------------------------------------
>
>                 Key: DRILL-4684
>                 URL: https://issues.apache.org/jira/browse/DRILL-4684
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Egbert van der Veen
>         Attachments: actual result.png, expected result.png, result with file query.png
>
>
> When joining a RDBMS table on itself (for instance; with a table that contains both subitems
and main items), Drill fails to correctly retrieve the result columns; the columns belonging
to the first time that the table is referenced, are always NULL. The values that should be
in these columns, instead end up in the columns belonging to the second time the table is
referenced.
> This has been tested using both MySQL and PostgreSQL databases as storage providers.
> See attached screenshots for expected result (which works when queried directly against
the database), and the actual result through Drill
> Code to reproduce:
> {code:sql}
> --create table
> CREATE TABLE `join_test` (
>   `id` int(11) NOT NULL,
>   `type` varchar(50) DEFAULT NULL,
>   `mainItem` int(11) DEFAULT NULL,
>   `description` varchar(255) DEFAULT NULL,
>   PRIMARY KEY (`id`),
>   UNIQUE KEY `id` (`id`)
> ) ;
> --insert testdata; one main item and two subitems. For subitems, column 'mainItem' is
a reference to the parent main item
> INSERT INTO join_test (id, type, mainItem, description) VALUES (1, 'mainItem', null,
'First main item');
> INSERT INTO join_test (id, type, mainItem, description) VALUES (2, 'subItem', 1, 'First
subitem of first main item');
> INSERT INTO join_test (id, type, mainItem, description) VALUES (3, 'subItem', 1, 'Second
subitem of first main item');
> --perform query
> SELECT main.type firsttype, main.description firstdescription, sub.type secondtype, sub.description
seconddescription 
> FROM join_test sub
> JOIN join_test main ON sub.mainItem = main.id
> {code}
> Update:
> The same query *does* work when querying a file through a file storage provider:
> {code:sql}
> CREATE TABLE dfs.tmp.`join_test.parquet` AS select * from mysql.playground.join_test;
> SELECT main.type firsttype, main.description firstdescription, sub.type secondtype, sub.description
seconddescription 
> FROM dfs.tmp.`join_test.parquet` sub
> JOIN dfs.tmp.`join_test.parquet` main ON sub.mainItem = main.id;
> {code}
> (see attachment 'result with file query.png' for the result)
> So the issue appears to be in the way Drill handlers RDBMS data sources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message