impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Skye Wanderman-Milne (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2853: introduce PARQUET_RESOLVE_BY_NAME query option
Date Mon, 21 Mar 2016 18:28:46 GMT
Skye Wanderman-Milne has posted comments on this change.

Change subject: IMPALA-2853: introduce PARQUET_RESOLVE_BY_NAME query option
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/2384/3/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 2025:   if (col_type == NULL) DCHECK_EQ(next_idx, 0);
> with the new way the code is structured, this might be more intuitive writt
Done


http://gerrit.cloudera.org:8080/#/c/2384/3/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 599: a value >= # 
> how about just simplify:
Done


http://gerrit.cloudera.org:8080/#/c/2384/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 169:   42: optional bool parquet_resolve_by_name = false
> while i see your point about resolve-by-id needing a fallback, I think this
Given that the only meaningful resolution orderings are:
* id, name
* id, ordinal
* name
* ordinal

And that field IDs don't actually exist yet, I think we should keep this option (or change
it to resolve_by_ordinal if that's somehow better), and later add a parquet_resolve_by_field_id
option as well. If we get the field ids in by C6, we can rename this option to parquet_resolve_legacy_files_by_name
or something.

At the very least, even if field IDs aren't implemented by C6, we can still rename this option
if we come up with something better.


http://gerrit.cloudera.org:8080/#/c/2384/3/testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test:

Line 55: '/test-warehouse/nested_resolution_by_name_test_parquet'
> needs $FILESYSTEM_PREFIX
Done


Line 170: ====
> any way to test the map key/value logic?
One way would be to generate custom files with switched and renamed fields. Or, with some
light refactoring, I think it should  be possible to unit test this case (and others). I think
the only non-trival change would be changing the table descriptor to contain a single root
record type that has all the column types as children, instead of special-casing the table-level
columns.

I'll send an email to the dev list about the column type change, since I think this is a good
idea either way. Let me know what you think about unit testing vs generating files for end-to-end
tests. I can do either, but I think unit testing will be better. If it turns out to be a bigger
change than anticipated I'll just generate the files.


http://gerrit.cloudera.org:8080/#/c/2384/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

Line 224: EXECUTE
> maybe call it 'SHELL' since execute has many meanings?
Good idea, done


http://gerrit.cloudera.org:8080/#/c/2384/3/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 240: 
> skip if s3 insert
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2384
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Silvius Rus <srus@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message