impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Salmon <nathan.gsal...@gmail.com>
Subject Re: [Impala-ASF-CR] IMPALA-4675: Case-insensitive matching of Parquet fields.
Date Fri, 03 Mar 2017 18:42:21 GMT
Sure deal.  Thank for keeping tabs and driving it over the line.  Learned a
good deal about the test harness, so future contributions should be more
well rounded.

Happy Friday,

NS

On Fri, Mar 3, 2017 at 1:27 PM, Alex Behm <alex.behm@cloudera.com> wrote:

> Nathan, thanks for identifying this bug and your contribution!
>
> On Fri, Mar 3, 2017 at 2:20 AM, Impala Public Jenkins (Code Review) <
> gerrit@cloudera.org> wrote:
>
>> Impala Public Jenkins has submitted this change and it was merged.
>>
>> Change subject: IMPALA-4675: Case-insensitive matching of Parquet fields.
>> ......................................................................
>>
>>
>> IMPALA-4675: Case-insensitive matching of Parquet fields.
>>
>> The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION
>> allows matching of Parquet fields by name instead of by
>> index (the default).
>>
>> Parquet column names are case sensitive, but Impala treats
>> db/table/column/field names as case-insensitive. Today,
>> there is no way today to select Parquet columns with mixed
>> casing via SQL using the name-based field resolution policy.
>>
>> This patch changes the matching of Parquet fields to be
>> case-insensitive.
>>
>> Testing:
>> - Modified the data files backing complextypestbl
>>   to contain fields with mixed casing.
>> - Several existing tests run against this table,
>>   including the test for name-based resolution.
>> - I confirmed that without this fix, the existing
>>   name-based resolution tests fail on the modified
>>   data files.
>> - I locally ran test_scanners.py and test_nested_types.py
>>   on exhaustive with this fix.
>>
>> Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4
>> Reviewed-on: http://gerrit.cloudera.org:8080/5891
>> Reviewed-by: Alex Behm <alex.behm@cloudera.com>
>> Tested-by: Impala Public Jenkins
>> ---
>> M be/src/exec/parquet-metadata-utils.cc
>> M be/src/exec/parquet-metadata-utils.h
>> M testdata/ComplexTypesTbl/nonnullable.avsc
>> M testdata/ComplexTypesTbl/nonnullable.json
>> M testdata/ComplexTypesTbl/nonnullable.parq
>> M testdata/ComplexTypesTbl/nullable.avsc
>> M testdata/ComplexTypesTbl/nullable.json
>> M testdata/ComplexTypesTbl/nullable.parq
>> M testdata/workloads/functional-query/queries/QueryTest/parque
>> t-resolution-by-name.test
>> M tests/query_test/test_scanners.py
>> 10 files changed, 71 insertions(+), 76 deletions(-)
>>
>> Approvals:
>>   Impala Public Jenkins: Verified
>>   Alex Behm: Looks good to me, approved
>>
>>
>>
>> --
>> To view, visit http://gerrit.cloudera.org:8080/5891
>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>
>> Gerrit-MessageType: merged
>> Gerrit-Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4
>> Gerrit-PatchSet: 7
>> Gerrit-Project: Impala-ASF
>> Gerrit-Branch: master
>> Gerrit-Owner: Nathan Salmon <nathan.gsalmon@gmail.com>
>> Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
>> Gerrit-Reviewer: Impala Public Jenkins
>> Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
>> Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
>> Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
>> Gerrit-Reviewer: Nathan Salmon <nathan.gsalmon@gmail.com>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "impala-cr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to impala-cr+unsubscribe@cloudera.com.
>> For more options, visit https://groups.google.com/a/cloudera.com/d/optout
>> .
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message