drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From F Méthot (JIRA) <j...@apache.org>
Subject [jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
Date Thu, 30 Jun 2016 18:15:10 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357603#comment-15357603
] 

F Méthot commented on DRILL-3562:
---------------------------------

Really would like to see this one fixed!
Here is a workaround we are doing to get to our data:

This will extract data without the null arrays:
   select t.a.b.c as c from dfs.`flat.json` t where  t.a.b.c[0]['d'] is not null
   (d is an value name expected to be found within the array)
but flatten still won't work:
To get flatten working:
    create table TEMP_JSON_DATA  as (select t.a.b.c as c from dfs..`flat.json` t where  t.a.b.c[0]['d']
is not null);
then 
   select flatten(c) from TEMP_JSON_DATA;

(using parquet format for temp table)

For interactive analysis of data, is a pretty lame workaround, but for scripting environment
that worked out fine, if you automate dropping of the temp table.



> Query fails when using flatten on JSON data where some documents have an empty array
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-3562
>                 URL: https://issues.apache.org/jira/browse/DRILL-3562
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.1.0
>            Reporter: Philip Deegan
>             Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) flat WHERE
flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast org.apache.drill.exec.vector.NullableIntVector
to org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated with dummy
data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message