impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4536: Decimal Parquet slots should be validated
Date Fri, 06 Jan 2017 13:07:44 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-4536: Decimal Parquet slots should be validated
......................................................................


Patch Set 1:

> If there's any interest, I could add this functionality to the
 > parquet-reader tool, which is a binary we provide that does basic
 > sanity checking on parquet files. That way we still provide some
 > way to check for invalid decimals without it being on the critical
 > path of query execution.
 > 
 > I think though that parquet-reader doesn't currently actually
 > decode the individual values, it mostly looks at header
 > information, so I'm not sure how easy this would be.

+1 for adding the functionality to extract and examine the data. I'm currently working on
write support for min/max statistics in Parquet files and would probably try and add that
to parquet-reader as another step of validation.

-- 
To view, visit http://gerrit.cloudera.org:8080/5525
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1ef3ab410843b33925d0387fcfd3bc4520e2fd81
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message