drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (DRILL-4614) Drill must appoint one data type per one column for self-describing data while querying directories
Date Wed, 20 Apr 2016 11:08:25 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vitalii Diravka closed DRILL-4614.
----------------------------------
    Resolution: Fixed

The problem is already mentioned here  https://issues.apache.org/jira/browse/DRILL-3806

> Drill must appoint one data type per one column for self-describing data while querying
directories 
> ----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4614
>                 URL: https://issues.apache.org/jira/browse/DRILL-4614
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.6.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>             Fix For: 1.7.0
>
>         Attachments: data.json
>
>
> While drill selects data from the directory and detects data types on-the-fly
> it is possible that one field will be of several data types . 
> For example:
> 1. Create an input file as follows
> 20K rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> 200 rows with the following - 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
> 2. CTAS as follows
> {code:sql}
> CREATE TABLE dfs.`tmp`.`tp` as select * from dfs.`data.json` t
> {code}
> In this case will be created parquet table as the folder with two files.
> 3. Select the data
> {code}
> select t.others.additional from dfs.`tmp`.`tp` t
> {code}
> *The result of selecting will be mix of EXPR$0<INT(OPTIONAL)>  and  EXPR$0<VARCHAR(OPTIONAL)>.*
> It happens because Drill defines column data type per file.  
> The same result with json files.
> Since streaming aggregate does not support schema changes this issue makes impossible
of using aggregate functions with query results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message