impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Date Fri, 27 Jan 2017 21:47:56 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 7: Code-Review+1

(9 comments)

http://gerrit.cloudera.org:8080/#/c/5611/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS6, Line 178: ProcessValue
> Marcel had suggested that name, but I'm good with either. Marcel, do you ha
That's fine then, no need to keep renaming it :)


Line 389:   virtual bool ProcessValue(void* value, int64_t* bytes_needed) {
> Done, though it has the same number of lines, but now uses two return state
It doesn't make a big difference in this case - we just tend to use the early-return pattern.


http://gerrit.cloudera.org:8080/#/c/5611/7/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 325:     self.execute_query("drop table %s" % qualified_table_name)
Not needed - it should be dropped with the unique_database


Line 434:   def test_write_statistics_multiple_row_groups(self, vector, unique_database):
Nice!


PS7, Line 446: num_lines
num_rows?


Line 447:     query = "create table %s like %s stored as parquet" % \
A while back someone who was more up-to-date on python suggested that it was better to use
.format() instead of % for string formatting. E.g. https://docs.python.org/3.4/library/stdtypes.html#old-string-formatting

I don't feel strongly but thought I should mention it.


Line 465:       assert l.max < r.min
Maybe this should be <=? E.g. consider two row groups that only have one value for that
column.


Line 467:     self.execute_query("drop table %s" % qualified_target_table)
Not needed - it should be dropped with the unique_database


Line 469:   def test_write_statistics_float_infinity(self, vector, unique_database):
Didn't think of this - good catch.


-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+gerrit@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message