drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1818) Parquet files generated by Drill ignore field names when nested elements are queried
Date Sun, 07 Dec 2014 22:46:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237303#comment-14237303
] 

Jacques Nadeau commented on DRILL-1818:
---------------------------------------

LGTM. +1

> Parquet files generated by Drill ignore field names when nested elements are queried
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-1818
>                 URL: https://issues.apache.org/jira/browse/DRILL-1818
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Neeraja
>            Assignee: Steven Phillips
>            Priority: Blocker
>         Attachments: 0_0_0.parquet, DRILL-1818.patch
>
>
> I observed this with this parquet file and a more comprehensive testing might be needed
here. The issue is that Drill seem to simply ignore field names at the leaf level and accessing
data in a positional fashion.
> Below is the repro.
> 1. Generate  a parquet file using Drill. Input is the JSON doc below
> create  table dfs.tmp.sampleparquet as (select trans_id, cast(`date` as date) transdate,cast(`time`
as time) transtime, cast(amount as double) amount,`user_info`,`marketing_info`, `trans_info`
from dfs.`/Users/nrentachintala/Downloads/sample.json` )
> 2. Now do queries. 
> Note in query below, there is no field name called 'keywords' in trans_info, but data
is just positionally returned (the data returned from prod_id column).
> 0: jdbc:drill:zk=local> select t.`trans_info`.keywords from dfs.tmp.sampleparquet
t where t.`trans_info`.keywords is not null;
> +------------+
> |   EXPR$0   |
> +------------+
> | [16]       |
> | []         |
> | [293,90]   |
> | [173,18,121,84,115,226,464,525,35,11,94,45] |
> | [311,29,5,41] |
> 0: jdbc:drill:zk=local> select t.`marketing_info`.keywords from dfs.tmp.sampleparquet
t;
> Note in the query below, it is trying to return the first element in marketing_Info which
is camp_id which is of int type for keywords columns. But keywords schema is string, so it
throws error with type mismatch.
> Query failed: Query failed: Failure while running fragment., You tried to write a VarChar
type when you are using a ValueWriter of type NullableBigIntWriterImpl. [ c3761403-b8c5-43c1-8e90-2c4918d1f85c
on 10.0.0.20:31010 ]
> [ c3761403-b8c5-43c1-8e90-2c4918d1f85c on 10.0.0.20:31010 ]
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> 0: jdbc:drill:zk=local> select t.`marketing_info`.`camp_id`,t.`marketing_info`.keywords
from dfs.tmp.sampleparquet t;
> +------------+------------+
> |   EXPR$0   |   EXPR$1   |
> +------------+------------+
> | 4          | ["go","to","thing","watch","made","laughing","might","pay","in","your","hold"]
|
> | 6          | ["pronounce","tree","instead","games","sigh"] |
> | 17         | []         |
> | 17         | ["it's"]   |
> | 8          | ["fallout"] |
> +------------+------------+
> Sample.json is below
> {"trans_id":0,"date":"2013-07-26","time":"04:56:59","amount":80.5,"user_info":{"cust_id":28,"device":"IOS5","state":"mt"},"marketing_info":{"camp_id":4,"keywords":["go","to","thing","watch","made","laughing","might","pay","in","your","hold"]},"trans_info":{"prod_id":[16],"purch_flag":"false"}}
> {"trans_id":1,"date":"2013-05-16","time":"07:31:54","amount":100.40,
> "user_info":{"cust_id":86623,"device":"AOS4.2","state":"mi"},"marketing_info":{"camp_id":6,"keywords":["pronounce","tree","instead","games","sigh"]},"trans_info":{"prod_id":[],"purch_flag":"false"}}
> {"trans_id":2,"date":"2013-06-09","time":"15:31:45","amount":20.25,
> "user_info":{"cust_id":11,"device":"IOS5","state":"la"},"marketing_info":{"camp_id":17,"keywords":[]},"trans_info":{"prod_id":[293,90],"purch_flag":"true"}}
> {"trans_id":3,"date":"2013-07-19","time":"11:24:22","amount":500.75,
> "user_info":{"cust_id":666,"device":"IOS5","state":"nj"},"marketing_info":{"camp_id":17,"keywords":["it's"]},"trans_info":{"prod_id":[173,18,121,84,115,226,464,525,35,11,94,45],"purch_flag":"false"}}
> {"trans_id":4,"date":"2013-07-21","time":"08:01:13","amount":34.20,"user_info":{"cust_id":999,"device":"IOS7","state":"ct"},"marketing_info":{"camp_id":8,"keywords":["fallout"]},"trans_info":{"prod_id":[311,29,5,41],"purch_flag":"false"}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message