Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 78B89200BCA for ; Mon, 21 Nov 2016 20:44:35 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 77514160AF9; Mon, 21 Nov 2016 19:44:35 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C08AE160AEF for ; Mon, 21 Nov 2016 20:44:34 +0100 (CET) Received: (qmail 72016 invoked by uid 500); 21 Nov 2016 19:44:34 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 72000 invoked by uid 99); 21 Nov 2016 19:44:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Nov 2016 19:44:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E97D5C0FD4 for ; Mon, 21 Nov 2016 19:44:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Y-2bdYkTeyAf for ; Mon, 21 Nov 2016 19:44:31 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id D1BEA5F1BE for ; Mon, 21 Nov 2016 19:44:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uALJhRqP022812; Mon, 21 Nov 2016 19:43:27 GMT Message-Id: <201611211943.uALJhRqP022812@ip-10-146-233-104.ec2.internal> Date: Mon, 21 Nov 2016 19:43:27 +0000 From: "Dan Hecht (Code Review)" To: Taras Bobrovytsky , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Lars Volker , Laszlo Gaal , Alex Behm Reply-To: dhecht@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4363=3A_Add_Parquet_timestamp_validation=0A?= X-Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf X-Gerrit-ChangeURL: X-Gerrit-Commit: 01935d470a644338df7144c692d68d484203b436 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Mon, 21 Nov 2016 19:44:35 -0000 Dan Hecht has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation ...................................................................... Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 473: ErrorMsg msg; this has a non-trivial destructor, so we should probably only construct it after checking NeedsValidation(). Also, why are NeedsValidation() and NeedsConversion() mutually exclusive? Actually, doesn't this break --convert_legacy_hive_parquet_utc_timestamps since we'll never take the conversion path for timestamp now? (Do we not have tests for that?) That is, I think both NeedsConversion() and NeedsValidation() can be true, and the code needs to be adjusted for that. Also consider tucking lines 475 to 480 inside ValidateSlot() and that way the ErrorMsg doesn't have to be passed back and forth, and it'll naturally keep the construction after the NeedsValidation() check. PS7, Line 474: val_ptr you'll also have to be careful that we pass the right pointer here, depending on how you address the comment above. http://gerrit.cloudera.org:8080/#/c/4968/7/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 147: /// Verifies that the timestamp date falls into a valid range (years 1400..9999). update this comment to be clear that we also validate time. PS7, Line 156: l doesn't this have to be "ll" (long long)? PS7, Line 160: time_.ticks() < NUMBER_OF_NANOSECONDS_IN_24H I'm slightly worried about how this interacts with leap seconds. Are you sure this is always illegal? If it doesn't crash, what behavior did you see (i.e. how do you know this is the correct bounds)? -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: Yes