impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Date Wed, 16 Nov 2016 06:49:09 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation

Patch Set 4:

File be/src/exec/

Line 77: 
> I'll remove the empty lines I accidentally added here and in other files.

Line 233:     // TODO: consider adding validation for other types, such as DECIMAL.
> Let's file a JIRA instead. You don't need to identify whether there is a bu

Line 388:               ReadSlot<IS_DICT_ENCODED>(tuple, pool);
> single line if it fits

Line 475:     if (NeedsValidation() && !ValidateSlot(reinterpret_cast<T*>(slot)))
> The API seems a little clunky because we call a generic ValidateSlot() but 
Added ErrorMsg* to ValidateSlot.

Line 503:     DCHECK(!needs_conversion_);
> DCHECK(false)?
This DCHECK should be removed completely.

Line 513:   /// Sets the slot to NULL if the slot value is invalid, e.g., due to being
> Update comment, doesn't set anything

Line 515:   bool ValidateSlot(T* src) {
> const function

Line 520:   /// Pull out slow-path Status construction code from ReadRepetitionLevel()/
> remove the last part "from ReadRepetitionLevel..." I don't think that's acc

Line 1084:   void* slot = tuple->GetSlot(tuple_offset_);
> just inline into L1087
File be/src/exec/parquet-column-readers.h:

Line 511:   /// Recursively reads from children_ to assemble a single CollectionValue into
> Update comment to reflect API change
File be/src/runtime/timestamp-value.h:

Line 22: 
> remove blank lines

Line 150:   inline bool IsValidDate() const {
> nice
File common/thrift/

Line 312:    "The valid date range is 1400..9999."),
> mention the full range with year-month-day to make it clearer
File testdata/bin/

Line 382:   hadoop fs -put -f ${IMPALA_HOME}/testdata/bad_parquet_data/out-of-range-timestamp.parq
> Let's avoid doing this and instead create a test table on-the-fly in the te
File tests/query_test/

Line 248:   def test_zero_rows(self, vector, unique_database):
> it would be nice to have a new test test_timestamp_out_of_range that looks 

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-HasComments: Yes

View raw message