drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-4264) Dots in identifier are not escaped correctly
Date Thu, 27 Jul 2017 01:10:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102548#comment-16102548
] 

Paul Rogers edited comment on DRILL-4264 at 7/27/17 1:09 AM:
-------------------------------------------------------------

Wonderful detailed analysis! You caught many detailed issues that my quick scan missed.

The solution for Parquet metadata seems good. I'm not an expert in that area, but a few unit
tests will validate the change once you make it. Bumping the version number will solve the
forward/backward compatibility issues (using the mechanism from DRILL-5660.)

The {{MaterializedField}} issue is harder. Fortunately, some of the nested-name issues might
not be actual issues.

For example, your example of [ScanBatch.Mutator:362|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java#L225]
should be OK as long as the caller knows call this method for top-level columns. This line
is used to build up a record batch during reading such as in JSON or Parquet. The problem
is if the container is a map. In this case, the caller should be calling {{AbstractMapVector.addOrGet()}}
to add the field rather than adding it at the top level using the {{Mutator}}.

Are there other cases where the code assembles a path then tears it down again? Or, parses
a path?

Otherwise, we can find all uses of {{MaterializedField.getPath()}}, verify that the really
only use the leaf name, and replace them with {{getName()}}. The same is true of {{getLastName()}}.


was (Author: paul-rogers):
Wonderful detailed analysis! You caught many detailed issues that my quick scan missed.

The solution for Parquet metadata seems good. I'm not an expert in that area, but a few unit
tests will validate the change once you make it. Bumping the version number will solve the
forward/backward compatibility issues (using the mechanism from DRILL-5660.)

The {{MaterializedField}} issue is harder. Fortunately, some of the nested-name issues might
not be actual issues.

For example, your example of [ScanBatch.Mutator:362|https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java#L225]
should be OK as long as the caller knows to pass in only the leaf name. This line is used
to build up a record batch during reading such as in JSON or Parquet. The problem is if the
container is a map. In this case, the caller should be calling {{AbstractMapVector.addOrGet()}}
to add the field rather than adding it at the top level using the {{Mutator}}.

Are there other cases where the code assembles a path then tears it down again? Or, parses
a path?

Otherwise, we can find all uses of {{MaterializedField.getPath()}}, verify that the really
only use the leaf name, and replace them with {{getName()}}. The same is true of {{getLastName()}}.

> Dots in identifier are not escaped correctly
> --------------------------------------------
>
>                 Key: DRILL-4264
>                 URL: https://issues.apache.org/jira/browse/DRILL-4264
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Codegen
>            Reporter: Alex
>            Assignee: Volodymyr Vysotskyi
>              Labels: doc-impacting
>
> If you have some json data like this...
> {code:javascript}
>     {
>       "0.0.1":{
>         "version":"0.0.1",
>         "date_created":"2014-03-15"
>       },
>       "0.1.2":{
>         "version":"0.1.2",
>         "date_created":"2014-05-21"
>       }
>     }
> {code}
> ... there is no way to select any of the rows since their identifiers contain dots and
when trying to select them, Drill throws the following error:
> Error: SYSTEM ERROR: UnsupportedOperationException: Unhandled field reference "0.0.1";
a field reference identifier must not have the form of a qualified name
> This must be fixed since there are many json data files containing dots in some of the
keys (e.g. when specifying version numbers etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message