drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefán Baxter (JIRA) <j...@apache.org>
Subject [jira] [Created] (DRILL-3563) Type confusion and number formatting exceptions
Date Mon, 27 Jul 2015 13:09:04 GMT
Stefán Baxter created DRILL-3563:
------------------------------------

             Summary: Type confusion and number formatting exceptions
                 Key: DRILL-3563
                 URL: https://issues.apache.org/jira/browse/DRILL-3563
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.1.0
            Reporter: Stefán Baxter
            Assignee: Jinfeng Ni


It seems that null values can trigger a column to be treated as a numeric one, in expressions
evaluation, regardless of content or other indicators and that fields in substructures can
affect same-named-fields in parent structure.
(1.2-SNAPSHOT, parquet files)

I have JSON data that can be reduced to to this:
{"occurred_at":"2015-07-26 08:45:41.234","type":"plan.item.added","dimensions":{"type":null,"dim_type":"Unspecified","category":"Unspecified","sub_category":null}}
{"occurred_at":"2015-07-26 08:45:43.598","type":"plan.item.removed","dimensions":{"type":"Unspecified","dim_type":null,"category":"Unspecified","sub_category":null}}
{"occurred_at":"2015-07-26 08:45:44.241","type":"plan.item.removed","dimensions":{"type":"To
See","category":"Nature","sub_category":"Waterfalls"}}
* notice the discrepancy in the dimensions structure that the type field is either called
type or dim_type (slightly relevant for the rest of this case)

1. Query where dimensions are not involved
select p.type, count(*) from dfs.tmp.`/analytics/processed/<some-tenant>/events` as
p where occurred_at > '2015-07-26' and p.type in ('plan.item.added','plan.item.removed')
group by p.type;
+--------------------+---------+
|        type        | EXPR$1  |
+--------------------+---------+
| plan.item.removed  | 947     |
| plan.item.added    | 40342   |
+--------------------+---------+
2 rows selected (0.508 seconds)

2. Same query but involves dimension.type as well

select p.type, coalesce(p.dimensions.dim_type, p.dimensions.type) dimensions_type, count(*)
from dfs.tmp.`/analytics/processed/<some-tenant>/events` as p where occurred_at >
'2015-07-26' and p.type in ('plan.item.added','plan.item.removed') group by p.type, coalesce(p.dimensions.dim_type,
p.dimensions.type);

Error: SYSTEM ERROR: NumberFormatException: To See
Fragment 2:0
[Error Id: 4756f549-cc47-43e5-899e-10a11efb60ea on localhost:31010] (state=,code=0)

I can provide test data if this is not enough to reproduce this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message