drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
Date Fri, 03 Jun 2016 18:55:59 GMT
Jinfeng Ni created DRILL-4707:
---------------------------------

             Summary: Conflicting columns names under case-insensitive policy lead to either
memory leak or incorrect result
                 Key: DRILL-4707
                 URL: https://issues.apache.org/jira/browse/DRILL-4707
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni
            Priority: Critical


On latest master branch:

{code}
select version, commit_id, commit_message from sys.version;
+-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
|     version     |                 commit_id                 |                          
      commit_message                                  |
+-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
| 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: Add a split function
that allows to separate string by a delimiter  |
+-----------------+-------------------------------------------+---------------------------------------------------------------------------------+
{code}

If a query has two conflicting column names under case-insensitive policy, Drill will either
hit memory leak, or incorrect issue.

Q1.

{code}
select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (131072)
Allocator(op:0:0:1:Project) 1000000/131072/2490368/10000000000 (res/actual/peak/limit)


Fragment 0:0
{code}

Q2: return only one column in the result. 
{code}
select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
+------+
| XYZ  |
+------+
| 0    |
| 1    |
| 1    |
| 1    |
| 4    |
| 0    |
| 3    |
{code}

The cause of the problem seems to be that the Project thinks the two incoming columns as identical
(since Drill adopts case-insensitive for column names in execution). 

The planner should make sure that the conflicting columns are resolved, since execution is
name-based. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message