impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4624: Implement Parquet dictionary filtering
Date Tue, 28 Feb 2017 21:33:04 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering

Patch Set 14:


Flushing out some comments I made while in transit.

I don't have any concerns about correctness but there were a few things that may be confusing
for people reading the code.
Commit Message:

Line 7: IMPALA-4624: Implement Parquet dictionary filtering
Can you mention the query option in the commit message?

PS14, Line 12: incides
File be/src/exec/

Line 704:     return Status(Substitute("Could not allocate buffer of $0 bytes for Parquet
Can you use MemTracker::MemLimitExceeded() here to construct the status? It also does some
logging that can be useful to diagnose the failure.
File be/src/exec/

Line 855
Thanks for fixing this. I was talking to Wes McKinney (who works on parquet-cpp) a month or
so ago and he'd been confused about why Impala was writing out encodings it wasn't using.
File be/src/exec/

PS14, Line 256: GetDictionary
GetDictionaryDecoder() for consistency with the other APIs?

Line 865:     if (!stream_->ReadBytes(data_size, &data_, &status)) return status;
Not your change, but can we just SkipBytes here? That would make the intent clearer.
File fe/src/main/java/org/apache/impala/analysis/

Line 243:   public boolean isDeterministic() {
I think we should be careful about the naming and comments here, since this will return true
for many non-deterministic functions - the state of things before this patch is pretty confusing.
The definition is Expr.isConstant() is subtle - the comment on that function tries to define
the current rules.

E.g. UDFs can be non-deterministic but the fe treats them as deterministic (for now). Or now()
isn't really deterministic, but we treat it as such because it shouldn't be re-evaluated within
a query.

This list of functions is really the builtin functions that have some kind of randomisation.
Can you rename it to something like isRandomizedBuiltin() and update the comment to reflect

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Matthew Mulder <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message