drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query
Date Wed, 24 Jan 2018 17:12:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Vitalii Diravka updated DRILL-4185:
    Labels: doc-impacting  (was: )

> UNION ALL involving empty directory on any side of union all results in Failed query
> ------------------------------------------------------------------------------------
>                 Key: DRILL-4185
>                 URL: https://issues.apache.org/jira/browse/DRILL-4185
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.4.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>            Priority: Major
>              Labels: doc-impacting
> UNION ALL query that involves an empty directory on either side of UNION ALL operator
results in FAILED query. We should return the results for the non-empty side (input) of UNION
> Note that empty_DIR is an empty directory, the directory exists, but it has no files
in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL select cast(columns[0]
as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 'empty_DIR'
not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from `testWindow.csv`
UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 'empty_DIR'
not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] (state=,code=0)
> {code}
> *Fix overview:*
> From now Drill can query an empty directory. It is a schemaless Drill table for now.

> User can query empty directory and use it for queries with any JOIN and UNION (UNION
ALL) operators. It works similar to empty files.
> Empty directory with parquet metadata cache files is schemaless Drill table as well.

> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null selection. SchemalessScan,
SchemalessBatchCreator and SchemalessBatch are introduced and used on execution state for
interactions with other operators and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan for such
table is not valid and SchemalessScan is used instead of that.

This message was sent by Atlassian JIRA

View raw message