drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5559) Incorrect query result when querying json files with schema change
Date Wed, 31 May 2017 21:21:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031982#comment-16031982
] 

Jinfeng Ni commented on DRILL-5559:
-----------------------------------

The build is today's master branch, with commit id : 

{code}
select commit_id, commit_message from sys.version;
+-------------------------------------------+--------------------------------------------------------------+
|                 commit_id                 |                        commit_message      
                 |
+-------------------------------------------+--------------------------------------------------------------+
| d11aba2e55323bb5a6a9deb5bb09fd87470dcedf  | DRILL-4335: Apache Drill should support network
encryption.  |
+-------------------------------------------+--------------------------------------------------------------+
{code}

> Incorrect query result when querying json files with schema change
> ------------------------------------------------------------------
>
>                 Key: DRILL-5559
>                 URL: https://issues.apache.org/jira/browse/DRILL-5559
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>
> Have two json files with nested structure. In the first one, `a.b` is bigint while in
the second one, `a.b` is a float.
> {code}
> cat 1.json
> {a:{b:100}}
> cat 2.json
> {a:{b:200.0}}
> {code}
> The following query would return wrong result for the second row. Notice that it's changed
from 200.0 to 4641240890982006784. 
> {code}
> select a from dfs.tmp.t2;
> +----------------------------+
> |             a              |
> +----------------------------+
> | {"b":100}                  |
> | {"b":4641240890982006784}  |
> +----------------------------+
> {code}
> Explain plan output:
> {code}
> explain plan for select a from dfs.tmp.t2;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(a=[$0])
> 00-02        Scan(groupscan=[EasyGroupScan [selectionRoot=file:/tmp/t2, numFiles=2, columns=[`a`],
files=[file:/tmp/t2/1.json, file:/tmp/t2/2.json]]])
> {code}
> If the involved operators could not handle schema change, at minimum we should fail the
query with SchemaChangeException error, in stead of returning wrong query results.
> Another interesting observation. If we query field `a.b` in stead of `a`, then Drill
returns correct result.
> {code}
> select t.a.b from dfs.tmp.t2 t;
> +---------+
> | EXPR$0  |
> +---------+
> | 100     |
> | 200.0   |
> +---------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message