drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
Date Wed, 18 May 2016 15:40:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289167#comment-15289167
] 

Aman Sinha commented on DRILL-4679:
-----------------------------------

With my proposed fix of generating an empty MapVector (by the ProjectRecordBatch) the regression
tests pass and I tried a few new tests that also pass.  Here's an example such query (this
used to fail earlier due to empty right side of the union-all): 
{noformat}
0: jdbc:drill:zk=local> SELECT CONVERT_FROM('[{a : 100, b: 200}, {a:300, b: 400}]' ,'JSON')
AS MYCOL1  FROM (VALUES(1)) union all SELECT CONVERT_FROM('[{a : 100, b: 200}, {a:300, b:
400}]' ,'JSON') AS MYCOL1  FROM (VALUES(1)) where 1 = 0;
+----------------------------------------+
|                 MYCOL1                 |
+----------------------------------------+
| [{"a":100,"b":200},{"a":300,"b":400}]  |
+----------------------------------------+
{noformat}
[~jni], [~jaltekruse] (or anyone else).. if you have thoughts about this, let me know.  I
will run more tests before creating a PR. 

> CONVERT_FROM()  json format fails if 0 rows are received from upstream operator
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-4679
>                 URL: https://issues.apache.org/jira/browse/DRILL-4679
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> CONVERT_FROM() json format fails as below if the underlying Filter produces 0 rows: 
> {noformat}
> 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x from cp.`tpch/region.parquet`
where r_regionkey = 9999;
> Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without first returning
OK_NEW_SCHEMA [#16, ProjectRecordBatch]
> Fragment 0:0
> {noformat}
> If the conversion is applied as UTF8 format,  the same query succeeds: 
> {noformat}
> 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x from cp.`tpch/region.parquet`
where r_regionkey = 9999;
> +----+
> | x  |
> +----+
> +----+
> No rows selected (0.241 seconds)
> {noformat}
> The reason for this is the special handling in the ProjectRecordBatch for JSON.  The
output schema is not known for this until the run time and the ComplexWriter in the Project
relies on seeing the input data to determine the output schema - this could be a MapVector
or ListVector etc.  
> If the input data has 0 rows due to a filter condition, we should at least produce a
default output schema, e.g an empty MapVector ?  Need to decide a good default.  Note that
the CONVERT_FROM(x, 'json') could occur on 2 branches of a UNION-ALL and if one input is empty
while the other side is not, it may still cause incompatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message