drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3353) Non data-type related schema changes errors
Date Mon, 06 Jul 2015 23:46:05 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615910#comment-14615910
] 

Steven Phillips commented on DRILL-3353:
----------------------------------------

Is it possible to share your data? It would make it much easier to reproduce and fix the problem.

> Non data-type related schema changes errors
> -------------------------------------------
>
>                 Key: DRILL-3353
>                 URL: https://issues.apache.org/jira/browse/DRILL-3353
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.0.0
>            Reporter: Oscar Bernal
>            Assignee: Steven Phillips
>             Fix For: 1.2.0
>
>
> I'm having trouble querying a data set with varying schema for a nested object fields.
The majority of my data for a specific type of record has the following nested data:
> {code}
> "attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
> {code}
> Among those records (hundreds of them) I have only two with a slightly different schema:
> {code}
> "attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
> {code}
> When trying to query the "new" fields, my queries fail:
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json`
as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad =
'Teste-FB-Engagement-Puro-iOS-230615"';
> Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615"
> Fragment 0:0
> [Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010]
(state=,code=0)
> {noformat}
> With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json`
as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are
using a ValueWriter of type NullableVarCharWriterImpl.
> File  file.json
> Record  35
> Fragment 0:0
> [Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010]
(state=,code=0)
> {noformat}
> If I try to extract all "attributes" from those events, Drill will only return a subset
of the fields, ignoring the others. 
> {noformat}
> 0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json`
as log where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
> +----------------------------------------------------+
> |                       EXPR$0                       |
> +----------------------------------------------------+
> | {"logged":"no","wearable":"no","type":"xxxx"}   |
> | {"logged":"no","wearable":"no","type":"xxxx"}  |
> | {"logged":"no","wearable":"no","type":"xxxx"}  |
> | {"logged":"no","wearable":"no","type":"xxxx"}    |
> | {"logged":"no","wearable":"no","type":"xxxx"}   |
> +----------------------------------------------------+
> {noformat}
> What I find strange is that I have thousands of records in the same file with different
schema for different record types and all other queries seem run well.
> Is there something about how Drill infers schema that I might be missing here? Does it
infer based on a sample % of the data and fail for records that were not taken into account
while inferring schema? I suspect I wouldn't have this error if I had 100's of records with
that other schema inside the file, but I can't find anything in the docs or code to support
that hypothesis. Perhaps it's just a bug? Is it expected?
> Troubleshooting guide seems to mention something about this but it's very vague in implying
Drill doesn't fully support schema changes. I thought that was for data type changes mostly,
for which there are other well documented issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message