drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4694) CTAS in JSON format produces extraneous NULL fields
Date Sat, 04 Jun 2016 01:42:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315241#comment-15315241
] 

ASF GitHub Bot commented on DRILL-4694:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/514#discussion_r65795030
  
    --- Diff: exec/java-exec/src/main/codegen/templates/JsonOutputRecordWriter.java ---
    @@ -120,7 +127,13 @@ public void writeField() throws IOException {
       <#elseif mode.prefix == "Repeated" >
         gen.write${typeName}(i, reader);
       <#else>
    +    <#if mode.prefix = "Nullable" >
    --- End diff --
    
    same as above


> CTAS in JSON format produces extraneous NULL fields
> ---------------------------------------------------
>
>                 Key: DRILL-4694
>                 URL: https://issues.apache.org/jira/browse/DRILL-4694
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> Consider the following JSON file: 
> {noformat}
> // file t2.json
> {
> "X" : {
>   "key1" : "value1",
>   "key2" : "value2"
>   } 
> }
> {
> "X" : {
>   "key3" : "value3",
>   "key4" : "value4"
>   }
> }
> {
> "X" : {
>   "key5" : "value5",
>   "key6" : "value6"
>   }
> }
> {noformat}
> Now create a table in Json format using CTAS: 
> {noformat}
> 0: jdbc:drill:zk=local> alter session set `store.format` = 'json';
> 0: jdbc:drill:zk=local> create table dfs.tmp.jt12 as select t.`X` from `t2.json` t;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 3                          |
> +-----------+----------------------------+
> {noformat}
> The output file has rows with union schema of all the fields in all the records.  This
creates extraneous Null fields in the output: 
> {noformat}
> $ cat jt12/0_0_0.json 
> {
>   "X" : {
>     "key1" : "value1",
>     "key2" : "value2",
>     "key3" : null,
>     "key4" : null,
>     "key5" : null,
>     "key6" : null
>   }
> } {
>   "X" : {
>     "key1" : null,
>     "key2" : null,
>     "key3" : "value3",
>     "key4" : "value4",
>     "key5" : null,
>     "key6" : null
>   }
> } {
>   "X" : {
>     "key1" : null,
>     "key2" : null,
>     "key3" : null,
>     "key4" : null,
>     "key5" : "value5",
>     "key6" : "value6"
>   }
> }
> {noformat}
> Note that if I change the output format to CSV or Parquet, there are no Null fields created
in the output file.   The expectation for a CTAS in json format is that the output should
match that of the input json data.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message