[ https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347948#comment-16347948
]
ASF GitHub Bot commented on DRILL-6129:
---------------------------------------
GitHub user sachouche opened a pull request:
https://github.com/apache/drill/pull/1106
DRILL-6129: Fixed query failure due to nested column data type change
Problem Description -
- The Drillbit was able to successfully send batches containing different metadata (for
nested columns)
- This was the case when one or multiple scanners were involved
- The issue happened within the client where value vectors are cached across batches
- The load(...) API is responsible for updating values vectors when a new batch arrives
- The RecordBatchLoader class is used to detect schema changes ; if this is the case,
then previous value vectors are discarded and new ones created
- There is a bug with the current implementation where only first level columns are compared
Fix -
- The fix is to improve the schema diff logic by including nested columns
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sachouche/drill DRILL-6129
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1106.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1106
----
commit 9ffb41f509cd2531e7f3cdf89a66605ec0fdf7a4
Author: Salim Achouche <sachouche2@...>
Date: 2018-02-01T02:59:58Z
DRILL-6129: Fixed query failure due to nested column data type change
----
> Query fails on nested data type schema change
> ---------------------------------------------
>
> Key: DRILL-6129
> URL: https://issues.apache.org/jira/browse/DRILL-6129
> Project: Apache Drill
> Issue Type: Bug
> Components: Client - CLI
> Affects Versions: 1.10.0
> Reporter: salim achouche
> Assignee: salim achouche
> Priority: Minor
> Fix For: 1.13.0
>
>
> Use-Case -
> * Assume two parquet files with similar schemas except for a nested column
> * Schema file1
> ** int64 field1
> ** optional group field2
> *** optional group field2.1 (LIST)
> **** repeated group list
> ***** optional group element
> ****** optional int64 child_field
> * Schema file2
> ** int64 field1
> ** optional group field2
> *** optional group field2.1 (LIST)
> **** repeated group list
> ***** optional group element
> ****** optional group child_field
> ******* optional int64 child_field_f1
> ******* optional int64 child_field_f1
> * Essentially child_field changed from an int64 to a group of fields
>
> Observed Query Failure
> select * from <file1 and file2>;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The field $bits$(UINT1:REQUIRED)
doesn't match the provided metadata major_type {
> minor_type: MAP
> mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the issue has
to do with the schema change logic.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
|