impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4363: Add Parquet timestamp validation
Date Wed, 16 Nov 2016 06:49:09 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 77: 
> I'll remove the empty lines I accidentally added here and in other files.
Done


Line 233:     // TODO: consider adding validation for other types, such as DECIMAL.
> Let's file a JIRA instead. You don't need to identify whether there is a bu
Done


Line 388:               ReadSlot<IS_DICT_ENCODED>(tuple, pool);
> single line if it fits
Done


Line 475:     if (NeedsValidation() && !ValidateSlot(reinterpret_cast<T*>(slot)))
{
> The API seems a little clunky because we call a generic ValidateSlot() but 
Added ErrorMsg* to ValidateSlot.


Line 503:     DCHECK(!needs_conversion_);
> DCHECK(false)?
This DCHECK should be removed completely.


Line 513:   /// Sets the slot to NULL if the slot value is invalid, e.g., due to being
> Update comment, doesn't set anything
Done


Line 515:   bool ValidateSlot(T* src) {
> const function
Done


Line 520:   /// Pull out slow-path Status construction code from ReadRepetitionLevel()/
> remove the last part "from ReadRepetitionLevel..." I don't think that's acc
Done


Line 1084:   void* slot = tuple->GetSlot(tuple_offset_);
> just inline into L1087
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 511:   /// Recursively reads from children_ to assemble a single CollectionValue into
> Update comment to reflect API change
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 22: 
> remove blank lines
Done


Line 150:   inline bool IsValidDate() const {
> nice
:)


http://gerrit.cloudera.org:8080/#/c/4968/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 312:    "The valid date range is 1400..9999."),
> mention the full range with year-month-day to make it clearer
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 382:   hadoop fs -put -f ${IMPALA_HOME}/testdata/bad_parquet_data/out-of-range-timestamp.parq
\
> Let's avoid doing this and instead create a test table on-the-fly in the te
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 248:   def test_zero_rows(self, vector, unique_database):
> it would be nice to have a new test test_timestamp_out_of_range that looks 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message