impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
Date Thu, 11 Jan 2018 19:12:47 GMT
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8986
)

Change subject: IMPALA-4252: [DOCS] Document min/max filters for Kudu tables
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8986/1/docs/topics/impala_runtime_filtering.xml
File docs/topics/impala_runtime_filtering.xml:

http://gerrit.cloudera.org:8080/#/c/8986/1/docs/topics/impala_runtime_filtering.xml@173
PS1, Line 173: 
             :         For HD
> Done. Because this paragraph is followed by info that's only relevant for B
Actually, the partitioned/broadcast and local/global discussion applies to min-max filters
as well.

I should also add that the long term plan is to have all filter types supported by all scan
types, so no need to separate out min-max as being a really specifically Kudu thing (though
of course it only applies to Kudu at the moment).


http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml
File docs/topics/impala_runtime_filtering.xml:

http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@181
PS2, Line 181: a complete list of relevant values
This is the only part I see that doesn't make sense for min-max filters, as they're not a
'list of values', but then a bloom filter isn't a 'list of values' either.

Maybe rephrase it something like "A broadcast filter reflects the complete set of relevant
values and can be immediately evaluated..." and "A partitioned filter reflects only the values
processed by one host..." or perhaps "contains" instead of reflects


http://gerrit.cloudera.org:8080/#/c/8986/2/docs/topics/impala_runtime_filtering.xml@203
PS2, Line 203: These filters are used by Kudu to scan a range of values
             :         for join columns when identifying matching rows within a join query.
I find this sentence confusing, as Kudu isn't identifying the matching rows (Kudu doesn't
even know we're doing a join, its just scanning values for us)

Maybe say something like "These filters are passed to Kudu to reduce the number of rows returnrf
to Impala when scanning the probe side of the join"



-- 
To view, visit http://gerrit.cloudera.org:8080/8986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15d8c952ab5b90e89fdd57640dfb4da882f7ecb2
Gerrit-Change-Number: 8986
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: John Russell <jrussell@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <todd@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:12:47 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message