impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Date Tue, 28 Feb 2017 21:33:04 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
......................................................................


Patch Set 14:

(7 comments)

Flushing out some comments I made while in transit.

I don't have any concerns about correctness but there were a few things that may be confusing
for people reading the code.

http://gerrit.cloudera.org:8080/#/c/5904/14//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4624: Implement Parquet dictionary filtering
Can you mention the query option in the commit message?


PS14, Line 12: incides
indices


http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 704:     return Status(Substitute("Could not allocate buffer of $0 bytes for Parquet
"
Can you use MemTracker::MemLimitExceeded() here to construct the status? It also does some
logging that can be useful to diagnose the failure.


http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 855
Thanks for fixing this. I was talking to Wes McKinney (who works on parquet-cpp) a month or
so ago and he'd been confused about why Impala was writing out encodings it wasn't using.


http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS14, Line 256: GetDictionary
GetDictionaryDecoder() for consistency with the other APIs?


Line 865:     if (!stream_->ReadBytes(data_size, &data_, &status)) return status;
Not your change, but can we just SkipBytes here? That would make the intent clearer.


http://gerrit.cloudera.org:8080/#/c/5904/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 243:   public boolean isDeterministic() {
I think we should be careful about the naming and comments here, since this will return true
for many non-deterministic functions - the state of things before this patch is pretty confusing.
The definition is Expr.isConstant() is subtle - the comment on that function tries to define
the current rules.

E.g. UDFs can be non-deterministic but the fe treats them as deterministic (for now). Or now()
isn't really deterministic, but we treat it as such because it shouldn't be re-evaluated within
a query.

This list of functions is really the builtin functions that have some kind of randomisation.
Can you rename it to something like isRandomizedBuiltin() and update the comment to reflect
that?


-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Matthew Mulder <mmulder@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message