drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neeraja (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-1818) Parquet files generated by Drill ignore field names when nested elements are queried
Date Sun, 07 Dec 2014 06:36:12 GMT
Neeraja created DRILL-1818:

             Summary: Parquet files generated by Drill ignore field names when nested elements
are queried
                 Key: DRILL-1818
                 URL: https://issues.apache.org/jira/browse/DRILL-1818
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Neeraja
            Priority: Blocker

I observed this with this parquet file and a more comprehensive testing might be needed here.
The issue is that Drill seem to simply ignore field names at the leaf level and accessing
data in a positional fashion.

Below is the repro.
1. Generate  a parquet file using Drill. Input is the JSON doc below

create  table dfs.tmp.sampleparquet as (select trans_id, cast(`date` as date) transdate,cast(`time`
as time) transtime, cast(amount as double) amount,`user_info`,`marketing_info`, `trans_info`
from dfs.`/Users/nrentachintala/Downloads/sample.json` )

2. Now do queries. 
Note in query below, there is no field name called 'keywords' in trans_info, but data is just
positionally returned (the data returned from prod_id column).
0: jdbc:drill:zk=local> select t.`trans_info`.keywords from dfs.tmp.sampleparquet t where
t.`trans_info`.keywords is not null;
|   EXPR$0   |
| [16]       |
| []         |
| [293,90]   |
| [173,18,121,84,115,226,464,525,35,11,94,45] |
| [311,29,5,41] |

0: jdbc:drill:zk=local> select t.`marketing_info`.keywords from dfs.tmp.sampleparquet t;

Note in the query below, it is trying to return the first element in marketing_Info which
is camp_id which is of int type for keywords columns. But keywords schema is string, so it
throws error with type mismatch.

Query failed: Query failed: Failure while running fragment., You tried to write a VarChar
type when you are using a ValueWriter of type NullableBigIntWriterImpl. [ c3761403-b8c5-43c1-8e90-2c4918d1f85c
on ]
[ c3761403-b8c5-43c1-8e90-2c4918d1f85c on ]

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=local> select t.`marketing_info`.`camp_id`,t.`marketing_info`.keywords
from dfs.tmp.sampleparquet t;
|   EXPR$0   |   EXPR$1   |
| 4          | ["go","to","thing","watch","made","laughing","might","pay","in","your","hold"]
| 6          | ["pronounce","tree","instead","games","sigh"] |
| 17         | []         |
| 17         | ["it's"]   |
| 8          | ["fallout"] |

Sample.json is below





This message was sent by Atlassian JIRA

View raw message