impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-2328 Parquet scan should use min/max stats
Date Sat, 16 Jul 2016 00:16:18 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-2328 Parquet scan should use min/max stats

Patch Set 1:

Thanks for posting your patch!

I have a few suggestions regarding the high-level approach that I'd like to see addressed
before further reviewing/accepting this patch.

Imo, these are the steps for pruning row groups based on min/max:
1. In the Impala Frontend, analyze the predicates assigned to an HdfsScanNode and generate
a list of applicable min predicates as well as max predicates that are going to be evaluated
against a scan tuple.
2. Ship those lists of predicates to the BE for execution (need to change the corresponding
thrift structs).
3. In the Backend, while doing a Parquet scan, create and materialize a min tuple based on
the current row group and evaluate the list of min predicates. Then do the same for the max
predicates. The row group is pruned if any of the min/max predicates return false.

I will leave a few more detailed comments in the code as to what I think are the right and
non-so-right design choices.

Thanks for working on this!

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I91de1f4d0fb2a982d06cd344e41901e3bf3c2cea
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Jian Wu <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Jian Wu <>
Gerrit-Reviewer: Michael Ho <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: No

View raw message