drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5553) SELECT *, columns produces nonsense results
Date Tue, 30 May 2017 00:30:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028678#comment-16028678
] 

Paul Rogers commented on DRILL-5553:
------------------------------------

The problem appears to be in the planner, not the CSV reader. The following is a snippet of
the physical plan given to the CSV reader:

{code}
    "columns" : [ "`*`" ],
{code}

As Arina noted elsewhere, the planner "compresses" the "columns" column into * for the purposes
of the scanner, but somehow expands it elsewhere. Since "columns" is special only to the CSV
reader, but not to Drill, the Project operator (perhaps) does not know that "columns" is supposed
to be a Varchar array.


> SELECT *, columns produces nonsense results
> -------------------------------------------
>
>                 Key: DRILL-5553
>                 URL: https://issues.apache.org/jira/browse/DRILL-5553
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider the case discussed in DRILL-5551. Create a slight variation. 
> Input file: CSV with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> As in DRILL-5550, CSV plugin is configured to use headers.
> Run this (admittedly strange) query:
> {code}
> SELECT *, columns FROM `dfs.data.example.csv`
> {code}
> The resulting schema is:
> {code}
> BatchSchema [fields=[
> a(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
> b(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
> c(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], 
> columns(INT:OPTIONAL) [$bits$(UINT1:REQUIRED), columns(INT:OPTIONAL)]], 
> selectionVector=NONE]
> {code}
> To make it easier to read:
> {code}
> a(VARCHAR:REQUIRED), 
> b(VARCHAR:REQUIRED).
> c(VARCHAR:REQUIRED),
> columns(INT:OPTIONAL)
> {code}
> In DRILL-5551, {{columns}} changes meaning from an array of columns to a blank normal
column. Here, it changes meaning again to a nullable Int (our normal "placeholder" for missing
columns.)
> Expected:
> 1. That, per DRILL-5552, no other column reference can occur with "*".
> 2. If item 1 is not fixed, that the scanner (or text reader) forbid the use of either
"*" or "columns" with other column references.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message