drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Farkas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4211) Inconsistent results from a joined sql statement to postgres tables
Date Thu, 10 Aug 2017 00:41:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120887#comment-16120887
] 

Timothy Farkas commented on DRILL-4211:
---------------------------------------

There seem to be two things going wrong here:

# Doing joins on two tables that have columns with the same names causes incorrect nulls in
the results DRILL-5713
# Column aliases are not pushed down to the JDBCReader, so the columns returned from postgres
have the original names. The Project operators in Drill then proceed to use the aliased column
names to do operations on batches. The aliased names ofcourse don't exist in the schema, so
nulls are returned for the aliased column.

I'll make this ticket focus on the aliasing issue, and I'll fix the column name conflict issue
in DRILL-5713

> Inconsistent results from a joined sql statement to postgres tables
> -------------------------------------------------------------------
>
>                 Key: DRILL-4211
>                 URL: https://issues.apache.org/jira/browse/DRILL-4211
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.3.0, 1.11.0
>         Environment: Postgres db stroage
>            Reporter: Robert Hamilton-Smith
>            Assignee: Timothy Farkas
>              Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a self join
to that table to get a parent value , Drill brings back inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food&Dining|
> |id1|restaurants|food&Dining|
> |id2|Coffee Shops|food&Dining|
> |id2|Coffee Shops|food&Dining|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00    Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname,
VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {110.0 rows, 110.0 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 64293
> 00-01      Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : rowType =
RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, VARCHAR(50) parentcat): rowcount
= 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
64292
> 00-02        Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) : rowType
= RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, VARCHAR(50) parentcat): rowcount
= 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
64291
> 00-03          Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" = "categories0"."categoryguid"])
: rowType = RecordType(VARCHAR(255) transactionguid, VARCHAR(255) relatedtransactionguid,
VARCHAR(255) transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) transactionrefobjecttype,
VARCHAR(255) transactionrefobjectguid, VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6)
transactiondate, VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3)
transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) transactionamount,
DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) transactionnotes, DECIMAL(2, 0) transactioninstrumenttype,
VARCHAR(20) transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, VARCHAR(50)
transactionorigpartyguid, VARCHAR(255) transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid,
VARCHAR(255) transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1,
0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) transactionrecategorized,
TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype,
VARCHAR(50) transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, VARCHAR(50)
transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, TIMESTAMP(6) transactionprocessedat,
TIMESTAMP(6) transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, VARCHAR(50)
transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) categoryguid0, VARCHAR(50)
categoryparentguid, DECIMAL(3, 0) categorytype, VARCHAR(50) categoryname, VARCHAR(50) categorydescription,
VARCHAR(50) partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, DECIMAL(3,
0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) categorydescription0, VARCHAR(50)
partyguid0): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 64259
> {code}
> I worked around it by creating a view on postgres but not ideal. Thanks in advance.
> First Drill Jira Bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message