drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oscar Bernal (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-3353) Non data-type related schema changes errors
Date Tue, 07 Jul 2015 01:04:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616012#comment-14616012
] 

Oscar Bernal edited comment on DRILL-3353 at 7/7/15 1:03 AM:
-------------------------------------------------------------

Absolutely! The following file contains the data which produces the errors reported in this
issue. Please let me know if I can help with anything else. Thanks!


was (Author: obernal):
Absolutely! The following file corresponds contains the data which produces the errors reported
in this issue. Please let me know if I can help with anything else. Thanks!

> Non data-type related schema changes errors
> -------------------------------------------
>
>                 Key: DRILL-3353
>                 URL: https://issues.apache.org/jira/browse/DRILL-3353
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Oscar Bernal
>            Assignee: Steven Phillips
>             Fix For: 1.2.0
>
>         Attachments: i-bfbc0a5c-ios-PulsarEvent-2015-06-23_19.json.zip
>
>
> I'm having trouble querying a data set with varying schema for a nested object fields.
The majority of my data for a specific type of record has the following nested data:
> {code}
> "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
> {code}
> Among those records (hundreds of them) I have only two with a slightly different schema:
> {code}
> "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
> {code}
> When trying to query the "new" fields, my queries fail:
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json`
as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad =
'Teste-FB-Engagement-Puro-iOS-230615"';
> Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615"
> Fragment 0:0
> [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010]
(state=,code=0)
> {noformat}
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json`
as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are
using a ValueWriter of type NullableVarCharWriterImpl.
> File  file.json
> Record  35
> Fragment 0:0
> [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010]
(state=,code=0)
> {noformat}
> If I try to extract all "attributes" from those events, Drill will only return a subset
of the fields, ignoring the others. 
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json`
as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
> +----------------------------------------------------+
> |                       EXPR$0                       |
> +----------------------------------------------------+
> | {"logged":"no","wearable":"no","type":"xxxx"}   |
> | {"logged":"no","wearable":"no","type":"xxxx"}  |
> | {"logged":"no","wearable":"no","type":"xxxx"}  |
> | {"logged":"no","wearable":"no","type":"xxxx"}    |
> | {"logged":"no","wearable":"no","type":"xxxx"}   |
> +----------------------------------------------------+
> {noformat}
> What I find strange is that I have thousands of records in the same file with different
schema for different record types and all other queries seem run well.
> Is there something about how Drill infers schema that I might be missing here? Does it
infer based on a sample % of the data and fail for records that were not taken into account
while inferring schema? I suspect I wouldn't have this error if I had 100's of records with
that other schema inside the file, but I can't find anything in the docs or code to support
that hypothesis. Perhaps it's just a bug? Is it expected?
> Troubleshooting guide seems to mention something about this but it's very vague in implying
Drill doesn't fully support schema changes. I thought that was for data type changes mostly,
for which there are other well documented issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message