Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 16DC2200CA6 for ; Tue, 30 May 2017 02:30:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0B390160BD9; Tue, 30 May 2017 00:30:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 57EA9160BD6 for ; Tue, 30 May 2017 02:30:10 +0200 (CEST) Received: (qmail 33364 invoked by uid 500); 30 May 2017 00:30:09 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 33355 invoked by uid 99); 30 May 2017 00:30:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 May 2017 00:30:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6B3F1C05B0 for ; Tue, 30 May 2017 00:30:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id eBALaE_2P5bC for ; Tue, 30 May 2017 00:30:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 792CB5FDBE for ; Tue, 30 May 2017 00:30:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 98BF9E03EE for ; Tue, 30 May 2017 00:30:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id EDDB12193A for ; Tue, 30 May 2017 00:30:04 +0000 (UTC) Date: Tue, 30 May 2017 00:30:04 +0000 (UTC) From: "Paul Rogers (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5553) SELECT *, columns produces nonsense results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 30 May 2017 00:30:11 -0000 [ https://issues.apache.org/jira/browse/DRILL-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028678#comment-16028678 ] Paul Rogers commented on DRILL-5553: ------------------------------------ The problem appears to be in the planner, not the CSV reader. The following is a snippet of the physical plan given to the CSV reader: {code} "columns" : [ "`*`" ], {code} As Arina noted elsewhere, the planner "compresses" the "columns" column into * for the purposes of the scanner, but somehow expands it elsewhere. Since "columns" is special only to the CSV reader, but not to Drill, the Project operator (perhaps) does not know that "columns" is supposed to be a Varchar array. > SELECT *, columns produces nonsense results > ------------------------------------------- > > Key: DRILL-5553 > URL: https://issues.apache.org/jira/browse/DRILL-5553 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Priority: Minor > > Consider the case discussed in DRILL-5551. Create a slight variation. > Input file: CSV with headers: > {code} > a,b,c > 10,foo,bar > {code} > As in DRILL-5550, CSV plugin is configured to use headers. > Run this (admittedly strange) query: > {code} > SELECT *, columns FROM `dfs.data.example.csv` > {code} > The resulting schema is: > {code} > BatchSchema [fields=[ > a(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], > b(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], > c(VARCHAR:REQUIRED) [$offsets$(UINT4:REQUIRED)], > columns(INT:OPTIONAL) [$bits$(UINT1:REQUIRED), columns(INT:OPTIONAL)]], > selectionVector=NONE] > {code} > To make it easier to read: > {code} > a(VARCHAR:REQUIRED), > b(VARCHAR:REQUIRED). > c(VARCHAR:REQUIRED), > columns(INT:OPTIONAL) > {code} > In DRILL-5551, {{columns}} changes meaning from an array of columns to a blank normal column. Here, it changes meaning again to a nullable Int (our normal "placeholder" for missing columns.) > Expected: > 1. That, per DRILL-5552, no other column reference can occur with "*". > 2. If item 1 is not fixed, that the scanner (or text reader) forbid the use of either "*" or "columns" with other column references. -- This message was sent by Atlassian JIRA (v6.3.15#6346)