drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6129) Query fails on nested data type schema change
Date Thu, 01 Feb 2018 03:16:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347948#comment-16347948
] 

ASF GitHub Bot commented on DRILL-6129:
---------------------------------------

GitHub user sachouche opened a pull request:

    https://github.com/apache/drill/pull/1106

    DRILL-6129: Fixed query failure due to nested column data type change

    Problem Description -
    - The Drillbit was able to successfully send batches containing different metadata (for
nested columns)
    - This was the case when one or multiple scanners were involved
    - The issue happened within the client where value vectors are cached across batches
    - The load(...) API is responsible for updating values vectors when a new batch arrives
    - The RecordBatchLoader class is used to detect schema changes ; if this is the case,
then previous value vectors are discarded and new ones created
    - There is a bug with the current implementation where only first level columns are compared
    
    Fix -
    - The fix is to improve the schema diff logic by including nested columns

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sachouche/drill DRILL-6129

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1106
    
----
commit 9ffb41f509cd2531e7f3cdf89a66605ec0fdf7a4
Author: Salim Achouche <sachouche2@...>
Date:   2018-02-01T02:59:58Z

    DRILL-6129: Fixed query failure due to nested column data type change

----


> Query fails on nested data type schema change
> ---------------------------------------------
>
>                 Key: DRILL-6129
>                 URL: https://issues.apache.org/jira/browse/DRILL-6129
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - CLI
>    Affects Versions: 1.10.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional group child_field
>  ******* optional int64 child_field_f1
>  ******* optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from <file1 and file2>;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The field $bits$(UINT1:REQUIRED)
doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the issue has
to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message