drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Volodymyr Vysotskyi (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-4264) Dots in identifier are not escaped correctly
Date Thu, 13 Jul 2017 13:43:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085700#comment-16085700
] 

Volodymyr Vysotskyi edited comment on DRILL-4264 at 7/13/17 1:42 PM:
---------------------------------------------------------------------

Currently Drill has inconsistent behaviour when querying the file with quotes. Query
{code:sql}
select * from test_table t
{code}
fails, but query 
{code:sql}
select `rk.q` as `rk.q` from test_table t
{code}
returns correct result for the file
{noformat}
{"rk.q": "a", "m": {"a.b":"1", "a":{"b":"2"}, "c":"3"}}
{noformat}
The difference between these two cases is that for the second case filed reference was created
using the method {{FieldReference.getWithQuotedRef(field.getName())}} which does not [check|https://github.com/apache/drill/blob/90f43bff7a01eaaee6c8861137759b05367dfcf3/logical/src/main/java/org/apache/drill/common/expression/FieldReference.java#L54]
the field name. In the first case constructor with check was [used|https://github.com/apache/drill/blob/416ec70a616e8d12b5c7fca809763b977d2f7aad/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L360].

Nested field may be selected by few ways:
{{t.m.c}} or {{t.m\['c'\]}}.
Without checking the field name, query
{code:sql}
select t.m.`a.b`, t.m.a.b, t.m['a.b'] from test_table t
{code}
returns correct result.
Mysql, for example, also allows quoted field with dots.

Preferred solution is to remove the check for field with dots.
But user may forget to add quotes for the field with dots, so query may return result that
does not expected by user.

Other solution is to add session option that allows to use fields with dots and depending
on this option check the field or not. By default the value of this option will be disabled
(the same behaviour as now). So user will be responsible for the queries with forgotten quotes.


was (Author: vvysotskyi):
Currently Drill has inconsistent behaviour when querying the file with quotes. Query
{code:sql}
select * from test_table t
{code}
fails, but query 
{code:sql}
select `rk.q` as `rk.q` from test_table t
{code}
returns correct result for the file
{noformat}
{"rk.q": "a", "m": {"a.b":"1", "a":{"b":"2"}, "c":"3"}}
{noformat}
The difference between these two cases is that for the second case filed reference was created
using the method {{FieldReference.getWithQuotedRef(field.getName())}} which does not [check|https://github.com/apache/drill/blob/90f43bff7a01eaaee6c8861137759b05367dfcf3/logical/src/main/java/org/apache/drill/common/expression/FieldReference.java#L54]
the field name. In the first case constructor with check was [used|https://github.com/apache/drill/blob/416ec70a616e8d12b5c7fca809763b977d2f7aad/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L360].

Nested field may be selected by few ways:
{{t.m.c}} or {{t.m\['c'\]}}.
Without checking the field name, query
{code:sql}
select t.m.`a.b`, t.m.a.b, t.m['a.b'] from test_table t
{code}
returns correct result.
Mysql, for example, also allows quoted field with dots.

Preferred solution is to remove the check for field with dots.
But user may forget to add quotes for the field with dots, so query may return result that
does not expected by user.

Other solution is to add session option that allows to use fields with dots and depending
on this option check the field or not. So user will be responsible for the queries with forgotten
quotes.

> Dots in identifier are not escaped correctly
> --------------------------------------------
>
>                 Key: DRILL-4264
>                 URL: https://issues.apache.org/jira/browse/DRILL-4264
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Alex
>            Assignee: Volodymyr Vysotskyi
>
> If you have some json data like this...
> {code:javascript}
>     {
>       "0.0.1":{
>         "version":"0.0.1",
>         "date_created":"2014-03-15"
>       },
>       "0.1.2":{
>         "version":"0.1.2",
>         "date_created":"2014-05-21"
>       }
>     }
> {code}
> ... there is no way to select any of the rows since their identifiers contain dots and
when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference "0.0.1";
a field reference identifier must not have the form of a qualified name
> This must be fixed since there are many json data files containing dots in some of the
keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message