impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5061: Populate null count in parquet::statistics in the parquet table writer
Date Fri, 02 Jun 2017 21:48:47 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-5061: Populate null_count in parquet::statistics in the parquet table

Patch Set 2:


Thank you for working on this!
Commit Message:

Line 7: IMPALA-5061: Populate null_count in parquet::statistics in the parquet table writer
Nit: Make the first line of the commit message fit in 80 chars. You can end it after "statistics"
File be/src/exec/

Line 553:       // We need to get min_value.
Let's only do the write path in this change and then address the read path in a subsequent
File be/src/exec/

PS2, Line 198: {}
Do you need the default implementation here? Without it, you could also remove the "Implemented
in.." comment, since then it'd be clear.
File be/src/exec/parquet-column-stats.h:

Line 79:   static bool ReadCountFromThrift(
I think it will be easier to return an int64 here. However we should do the read path in a
later change altogether.

Line 115:   bool has_null_count_;
Why do we need has_null_count_ instead of just always counting them? Are there types for which
we don't know the number of nulls we wrote?

Line 170:   void UpdateNullCount();
Can this go into the base class? It doesn't depend on T.
File be/src/exec/parquet-column-stats.inline.h:

Line 50:   null_count_++;
nit: we use prefix operators, mostly for consistency, but sometimes for performance reasons.
File tests/query_test/

Line 323:       if ((stats.min_value is None and stats.max_value is None) 
With the additional checks below, is this line still needed? What is its intent?

Please remove the trailing whitespace

Line 328:       min_value = max_value = null_count = None
You should have one assignment per line

Line 427:         ColumnStats('id', 0, 7299, None),
Why are these None and not 0? I think we should count 0s even if there are none.

PS2, Line 607: 	
Replace tabs with spaces, here and elsewhere

Line 620: 									"functional_parquet.zipcode_incomes", expected_min_max_values)
The formatting here seems off

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c49a63af84c2234f0633be63206cb52eb7e8ebb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-HasComments: Yes

View raw message