drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oscar Bernal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3353) Non data-type related schema changes errors
Date Wed, 24 Jun 2015 15:13:05 GMT
Oscar Bernal created DRILL-3353:
-----------------------------------

             Summary: Non data-type related schema changes errors
                 Key: DRILL-3353
                 URL: https://issues.apache.org/jira/browse/DRILL-3353
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - JSON
    Affects Versions: 1.0.0
            Reporter: Oscar Bernal
            Assignee: Steven Phillips


I'm having trouble querying a data set with varying schema for a nested object fields. The
majority of my data for a specific type of record has the following nested data:

{code}
"attributes":{"daysSinceInstall":0,"destination":"none","logged":"no","nth":1,"type":"organic","wearable":"no"}}
{code}

Among those records (hundreds of them) I have only two with a slightly different schema:

{code}
"attributes":{"adSet":"Teste-Adwords-Engagement-Branch-iOS-230615-adset","campaign":"Teste-Adwords-Engagement-Branch-iOS-230615","channel":"Adwords","daysSinceInstall":0,"destination":"none","logged":"no","nth":4,"type":"branch","wearable":"no"}}
{code}

When trying to query the "new" fields, my queries fail:

With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = true;{code}

{noformat}
0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json` as log
where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.event.attributes.ad = 'Teste-FB-Engagement-Puro-iOS-230615"';
Error: SYSTEM ERROR: java.lang.NumberFormatException: Teste-FB-Engagement-Puro-iOS-230615"

Fragment 0:0

[Error Id: 22d37a65-7dd0-4661-bbfc-7a50bbee9388 on ip-10-0-1-16.sa-east-1.compute.internal:31010]
(state=,code=0)
{noformat}

With {code:sql}ALTER SYSTEM SET `store.json.all_text_mode` = false;`{code}

{noformat}
0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json` as log
where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE';
Error: DATA_READ ERROR: Error parsing JSON - You tried to write a Bit type when you are using
a ValueWriter of type NullableVarCharWriterImpl.

File  file.json
Record  35
Fragment 0:0

[Error Id: 5746e3e9-48c0-44b1-8e5f-7c94e7c64d0f on ip-10-0-1-16.sa-east-1.compute.internal:31010]
(state=,code=0)
{noformat}

If I try to extract all "attributes" from those events, Drill will only return a subset of
the fields, ignoring the others. 

{noformat}
0: jdbc:drill:zk=local> select log.event.attributes from `dfs`.`root`.`/file.json` as log
where log.si = '07A3F985-4B34-4A01-9B83-3B14548EF7BE' and log.type ='Opens App';
+----------------------------------------------------+
|                       EXPR$0                       |
+----------------------------------------------------+
| {"logged":"no","wearable":"no","type":"xxxx"}   |
| {"logged":"no","wearable":"no","type":"xxxx"}  |
| {"logged":"no","wearable":"no","type":"xxxx"}  |
| {"logged":"no","wearable":"no","type":"xxxx"}    |
| {"logged":"no","wearable":"no","type":"xxxx"}   |
+----------------------------------------------------+
{noformat}

What I find strange is that I have thousands of records in the same file with different schema
for different record types and all other queries seem run well.

Is there something about how Drill infers schema that I might be missing here? Does it infer
based on a sample % of the data and fail for records that were not taken into account while
inferring schema? I suspect I wouldn't have this error if I had 100's of records with that
other schema inside the file, but I can't find anything in the docs or code to support that
hypothesis. Perhaps it's just a bug? Is it expected?

Troubleshooting guide seems to mention something about this but it's very vague in implying
Drill doesn't fully support schema changes. I thought that was for data type changes mostly,
for which there are other well documented issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message