impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4675: Case-insensitive matching of Parquet fields.
Date Fri, 03 Mar 2017 02:13:35 GMT
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5891

to look at the new patch set (#5).

Change subject: IMPALA-4675: Case-insensitive matching of Parquet fields.
......................................................................

IMPALA-4675: Case-insensitive matching of Parquet fields.

The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION
allows matching of Parquet fields by name instead of by
index (the default).

Parquet column names are case sensitive, but Impala treats
db/table/column/field names as case-insensitive. Today,
there is no way today to select Parquet columns with mixed
casing via SQL using the name-based field resolution policy.

This patch changes the matching of Parquet fields to be
case-insensitive.

Testing:
- Modified the data files backing complextypestbl
  to contain fields with mixed casing.
- Several existing tests run against this table,
  including the test for name-based resolution.
- I confirmed that without this fix, the existing
  name-based resolution tests fail on the modified
  data files.
- I locally ran test_scanners.py and test_nested_types.py
  on exhaustive with this fix.

Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4
---
M be/src/exec/parquet-metadata-utils.cc
M be/src/exec/parquet-metadata-utils.h
M testdata/ComplexTypesTbl/nonnullable.avsc
M testdata/ComplexTypesTbl/nonnullable.json
M testdata/ComplexTypesTbl/nonnullable.parq
M testdata/ComplexTypesTbl/nullable.avsc
M testdata/ComplexTypesTbl/nullable.json
M testdata/ComplexTypesTbl/nullable.parq
M testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M tests/query_test/test_scanners.py
10 files changed, 71 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5891/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5891
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Nathan Salmon <nathan.gsalmon@gmail.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Nathan Salmon <nathan.gsalmon@gmail.com>

Mime
View raw message