drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1778) upper, lower operators sometimes fail the JVM or return duplicated columns
Date Fri, 05 Dec 2014 00:09:13 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234825#comment-14234825
] 

Aman Sinha commented on DRILL-1778:
-----------------------------------

Actually, this is not specific to any functions; it is about combining the '*' column with
any expression.  Without a LIMIT, we produce the right set of columns.  With the LIMIT, there
is an extra column for the expression (see below).  Note that the explain plan below shows
a Project below the Limit and a second Project above the Limit.  The second Project is added
as part of finalColumnReorder() in DefaultSqlHandler and is needed for proper ordering of
columns as well as alias naming.  The problem occurs because ProjectRecordBatch sees '*' and
produces all incoming columns (including the expression column) and then adds the expression
column again to the outgoing batch. 

{code:sql}
0: jdbc:drill:zk=local> select *, n.n_nationkey+1 from cp.`tpch/nation.parquet` n limit
1;
+-------------+------------+-------------+------------+------------+------------+
| n_nationkey |   n_name   | n_regionkey | n_comment  |   EXPR$1   |  EXPR$10   |
+-------------+------------+-------------+------------+------------+------------+
| 0           | ALGERIA    | 0           |  haggle. carefully final deposits detect slyly
agai | 1          | 1          |
+-------------+------------+-------------+------------+------------+------------+

: jdbc:drill:zk=local> explain plan for select *, n.n_nationkey+1 from cp.`tpch/nation.parquet`
n limit 1;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(*=[$0], EXPR$1=[$1])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[1])
00-04            Project(*=[$0], EXPR$1=[+($1, 1)])
00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]],
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`*`]]])
{code}

It seems to me that we need to prefix the base table columns coming out of the first Project
(operator id 4) as something like T1||*  just like we do for select * above a Join.  Then
the second Project (operator 1) will recognize that do the right thing when processing *.
 To illustrate, the following join query produces correct results: 
{code: sql}
0: jdbc:drill:zk=local> select *, n.n_nationkey+1 from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet`
r where n.n_nationkey = r.r_regionkey limit 1;
+-------------+------------+-------------+------------+-------------+------------+------------+------------+
| n_nationkey |   n_name   | n_regionkey | n_comment  | r_regionkey |   r_name   | r_comment
 |   EXPR$2   |
+-------------+------------+-------------+------------+-------------+------------+------------+------------+
| 0           | ALGERIA    | 0           |  haggle. carefully final deposits detect slyly
agai | 0           | AFRICA     | lar deposits. blithely final packages cajole. regular waters
are final requests. regular accounts are according to  | 1          |
+-------------+------------+-------------+------------+-------------+------------+------------+------------+
1 row selected (0.345 seconds)
0: jdbc:drill:zk=local> explain plan for select *, n.n_nationkey+1 from cp.`tpch/nation.parquet`
n, cp.`tpch/region.parquet` r where n.n_nationkey = r.r_regionkey limit 1;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(*=[$0], *0=[$1], EXPR$2=[$2])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[1])
00-04            Project(T6¦¦*=[$0], T7¦¦*=[$2], EXPR$2=[+($1, 1)])
00-05              HashJoin(condition=[=($1, $3)], joinType=[inner])
00-07                Project(T6¦¦*=[$0], T6¦¦n_nationkey=[$1])
00-09                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]],
selectionRoot=/tpch/nation.parquet, numFiles=1, columns=[`*`]]])
00-06                Project(T7¦¦*=[$0], T7¦¦r_regionkey=[$1])
00-08                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/region.parquet]],
selectionRoot=/tpch/region.parquet, numFiles=1, columns=[`*`]]])
{code}

> upper, lower operators sometimes fail the JVM or return duplicated columns
> --------------------------------------------------------------------------
>
>                 Key: DRILL-1778
>                 URL: https://issues.apache.org/jira/browse/DRILL-1778
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Operators
>         Environment: Embedded Mode
>            Reporter: Sean Hsuan-Yi Chu
>            Assignee: Sean Hsuan-Yi Chu
>
> *. The cases the operators pass through 
> 1. select *, upper(TABLE_NAME) from INFORMATION_SCHEMA.`TABLES`;
> 2. select upper(TABLE_NAME), * from INFORMATION_SCHEMA.`TABLES`;
> *. The cases the operators fail
> 1. select upper(first_name),* from cp.`employee.json` limit 2
> => upper(first_name) was repeated "twice". 
> 2. select *, upper(first_name) from cp.`employee.json` limit 2
> => Crashed JVM entirely



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message