impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Date Mon, 23 Oct 2017 22:32:15 GMT
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
......................................................................


Patch Set 6:

(10 comments)

Few more comments following on from the previous one. I think my primary concern now is whether
we have enough test coverage of edge cases (low memory, extreme values, etc). The code now
seems solid so mainly concerned about preventing future regressions.

http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG@26
PS7, Line 26: - Updated planner tests.
Might be worth mentioning why the runtime filters were renumbered in all the planner tests


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h@106
PS7, Line 106:   bool bloom_filter() const { return filter->filter_desc().bloom_filter;
}
Long lines


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@140
PS7, Line 140:   static void Or(const TMinMaxFilter& in, TMinMaxFilter* out);
Nit: not sure why this is using underscores while AlwaysTrue() is using camel case.

To me it feels like virtual functions should be camel case.


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@176
PS7, Line 176:  private:
Can you briefly mention what it means when these are empty - that the values haven't been
materialized.


http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291
PS6, Line 291:       BigIntMinMaxFilter::Or(in, out);
> TColumnData doesn't have a timestamp field, so I convert timestamps with Ut
Oh ok, thanks for clarifying. It looked superficially like a last line copy/paste error (https://www.viva64.com/en/b/0260/).

The microseconds unix time does have lower precision than Impala's internal timestamp and
I'm not sure if it has the same range. It may not affect correctness but it's a little tricky
to reason about all the possible cases. It seems like it would be better to keep full precision
until we have to push it to kudu - maybe we could extend TColumnValue to support a native
timestamp type.

Rather confusingly, we have two TColumnValue classes - the HS2 one, which is only used in
a few places, and our internal one. It seems safe to modify our internal one to include a
native timestamp field.

In another place we convert timestamps to string to store in in a TColumnValue - see SetTColumnValue()
in fe-support.cc - but it's not totally clear to me why (that code's been there since this
commit in 2012: be98df19c8efe59577e6faaa6089f0383009b703).


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@137
PS7, Line 137:   if (!in.always_false) {
Do we have end-to-end tests for these code paths? I think we could generally do with some
more targeted tests for edge cases of min/max filters.


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143
PS7, Line 143: rin
static_cast instead of int()?


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143
PS7, Line 143:       out->max.__set_string_val(in.max.string_val);
This has a fairly large number of edge cases - it would be good to unit test it.


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@150
PS7, Line 150:   out->min.__set_string_val(in.min.string_val);
I would have expected that we would modify the trailing values that overflowed as well.

i.e. incrementing [0, 255] should give [1, 0]


http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@207
PS7, Line 207:     case PrimitiveType::TYPE_DOUBLE:
It seems tricky to test these various error paths in end-to-end tests. Could we exercise them
in unit tests?

Maybe also add a brief comment to explain how it's handling the error?



-- 
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mjacobs@apache.org>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <todd@apache.org>
Gerrit-Comment-Date: Mon, 23 Oct 2017 22:32:15 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message