impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Date Thu, 26 Oct 2017 20:54:59 GMT
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793
)

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
......................................................................


Patch Set 8:

> > Patch Set 7:
 > >
 > > > > Patch Set 7:
 > >  > >
 > >  > > Perf results:
 > >  > > ...
 > >  >
 > >  > I'm surprised that only a few queries saw significant
 > speedups. Is
 > >  > this in line with what you saw with Parquet runtime filters on
 > >  > TPC-H? Or are we losing a lot by using min/max instead of
 > bloom or
 > >  > in-list style filters?
 > >
 > > Not sure about bloom filters perf, though I can run those numbers
 > for comparison.
 > 
 > I haven't looked at this patch, but had a question about the
 > design:
 > 
 > Are we still pushing blooms across a join to prevent shuffling of
 > data? Or are we now pushing _only_ min/max?
 > 
 > It seems there is value in pushing both: the bloom for evaluation
 > on the other side of the join to prevent shuffling, and the min/max
 > to push all the way to the scanner to reduce I/O.
 > 
 > Not sure if the patch is already doing this.

Impala only evaluates runtime filters in the scan. Even prior to this patch, the Kudu scanner
was not evaluating bloom filters (and hash joins with Kudu scan targets don't build bloom
filters).

It certainly could be useful to evaluate bloom filters on the Impala side of a Kudu scan,
but I believe our thinking was that it wasn't worth it to implement that - better to just
wait until bloom filters can be pushed all the way down into Kudu. If bloom filters in Kudu
are a long way off, though, we should maybe reevaluate that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjacobs@apache.org>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <todd@apache.org>
Gerrit-Comment-Date: Thu, 26 Oct 2017 20:54:59 +0000
Gerrit-HasComments: No

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message