drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5559) Incorrect query result when querying json files with schema change
Date Wed, 31 May 2017 21:18:04 GMT
Jinfeng Ni created DRILL-5559:
---------------------------------

             Summary: Incorrect query result when querying json files with schema change
                 Key: DRILL-5559
                 URL: https://issues.apache.org/jira/browse/DRILL-5559
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Jinfeng Ni


Have two json files with nested structure. In the first one, `a.b` is bigint while in the
second one, `a.b` is a float.

{code}
cat 1.json
{a:{b:100}}

cat 2.json
{a:{b:200.0}}
{code}

The following query would return wrong result for the second row. Notice that it's changed
from 200.0 to 4641240890982006784. 

{code}
select a from dfs.tmp.t2;
+----------------------------+
|             a              |
+----------------------------+
| {"b":100}                  |
| {"b":4641240890982006784}  |
+----------------------------+
{code}

Explain plan output:
{code}
explain plan for select a from dfs.tmp.t2;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(a=[$0])
00-02        Scan(groupscan=[EasyGroupScan [selectionRoot=file:/tmp/t2, numFiles=2, columns=[`a`],
files=[file:/tmp/t2/1.json, file:/tmp/t2/2.json]]])
{code}

If the involved operators could not handle schema change, at minimum we should fail the query
with SchemaChangeException error, in stead of returning wrong query results.

Another interesting observation. If we query field `a.b` in stead of `a`, then Drill returns
correct result.

{code}
select t.a.b from dfs.tmp.t2 t;
+---------+
| EXPR$0  |
+---------+
| 100     |
| 200.0   |
+---------+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message