drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6129) Query fails on nested data type schema change
Date Thu, 01 Feb 2018 17:25:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348951#comment-16348951
] 

ASF GitHub Bot commented on DRILL-6129:
---------------------------------------

Github user sachouche commented on the issue:

    https://github.com/apache/drill/pull/1106
  
    Thanks @paul-rogers for the information; I went through the PR and noticed the following:
    
    - BatchSchema invokes MaterializedField.isEquivalent()
    - With my fix, both methods consider nested columns but they have several differences
    
    1) RecordBatchLoader requires sameness as this information is used to reuse the value
vectors; if old and new batch are deemed same, then the value vectors are reloaded using the
load(...) API. The metadata better be the same or a runtime exception will occur
    
    2) RecordBatchLoader isSame(...) API compares two different java objects: SerializedField
(obtained from protobufs) and already materialized value vectors MaterializedField
    
    3) RecordBatchLoader isSame(...) API tolerates unordered fields (within the same level)
but not MaterializedField.isEquivalent() method
    
    4) MaterializedField.isEquivalent() ignores hidden columns such "$bits" and "$offsets"
but not RecordBatchLoader isSame(...)
    
    I think moving forward, the best way to prevent bugs with regard to schema changes is
by maintaining a document that establishes all the rules. This will allow QA to refine their
tests and catch current limitations.  



> Query fails on nested data type schema change
> ---------------------------------------------
>
>                 Key: DRILL-6129
>                 URL: https://issues.apache.org/jira/browse/DRILL-6129
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - CLI
>    Affects Versions: 1.10.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional group child_field
>  ******* optional int64 child_field_f1
>  ******* optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from <file1 and file2>;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The field $bits$(UINT1:REQUIRED)
doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the issue has
to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message