impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4725: Query option to control Parquet array resolution.
Date Sat, 04 Mar 2017 05:23:05 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4725: Query option to control Parquet array resolution.
......................................................................


Patch Set 1: Code-Review+2

(7 comments)

Looks fine, but had a couple of suggestions that you can ignore. If you do implement them,
I can take another look.

http://gerrit.cloudera.org:8080/#/c/6250/1//COMMIT_MSG
Commit Message:

PS1, Line 11: resoution
resolution


PS1, Line 29: resoution
resolution


http://gerrit.cloudera.org:8080/#/c/6250/1/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

PS1, Line 409: 3)
what's this? the number of enum elements? consider defining NUM_ARRAY_ENCODINGS or something
as the last element, and using that here (and see below).


Line 416:   Status one_level_status = Status::OK();
could be:

bool missing_fields[NUM_ARRAY_ENCODINGS];
Status statuses[NUM_ARRAY_ENCODINGS];

then below you can just use those directly by indexing with array_encoding. And use for-loops
for the stuff toward the end of this function.

given that it's very unlikely we'll ever have to add another enum value here, i'm fine with
leaving this alone as well, if that's your preference.


http://gerrit.cloudera.org:8080/#/c/6250/1/be/src/exec/parquet-metadata-utils.h
File be/src/exec/parquet-metadata-utils.h:

Line 144:     GetOrderedArrayEncodings(array_resolution_, &ordered_array_encodings_);
why not do this in the constructor and don't bother saving array_resolution_? That way, there's
no state redundancy.

Alternatively, and I think even better, we could have a static const table that maps from
array_resolution to ordered encodings list which can be used in place of ordered_array_encodings_.
 That way again we won't have the redundancy. And it's better because then there's no question
that the ordering is not something we fiddle with at runtime.

The redundancy isn't so bad here, but we generally have too much redundant state which makes
it harder to reason about the system, so it's good to avoid redundant state.  Without it,
for example, there's no question about what influences the array resolution order.


Line 190:   void GetOrderedArrayEncodings(TParquetArrayResolution::type array_resolution,
could be static, then there's no question that array_resolution is the only input.


http://gerrit.cloudera.org:8080/#/c/6250/1/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 264:   // it non-default in a compatibility breaking release.
why want a jira for that, or else we'll likely forget to do it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f32e19ec542d4d485154c9d65d0f5e3f9f0a907
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message