Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 40B6A200BC2 for ; Thu, 17 Nov 2016 19:08:55 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 40427160B0B; Thu, 17 Nov 2016 18:08:55 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8C66C160AD8 for ; Thu, 17 Nov 2016 19:08:54 +0100 (CET) Received: (qmail 97178 invoked by uid 500); 17 Nov 2016 18:08:53 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 97162 invoked by uid 99); 17 Nov 2016 18:08:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2016 18:08:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id DECF5180538 for ; Thu, 17 Nov 2016 18:08:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id guUTJah21YwO for ; Thu, 17 Nov 2016 18:08:51 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5A1805F295 for ; Thu, 17 Nov 2016 18:08:50 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uAHI8XMS016741; Thu, 17 Nov 2016 18:08:33 GMT Date: Thu, 17 Nov 2016 18:08:33 +0000 From: "Dan Hecht (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org Message-ID: Reply-To: dhecht@cloudera.com X-Gerrit-MessageType: newchange Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4440=3A_lineage_timestamps_can_go_backwards_across_daylight_savings_transitions=0A?= X-Gerrit-Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 X-Gerrit-ChangeURL: X-Gerrit-Commit: b9ff0e096435cbaabd5ae7368f85d0c3c80e3cc3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Thu, 17 Nov 2016 18:08:55 -0000 Dan Hecht has uploaded a new change for review. http://gerrit.cloudera.org:8080/5129 Change subject: IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions ...................................................................... IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions Using TimestampValue (or equivalent string representation) for timestamps that require a point in time doesn't work because the same time can represent multiple point in times. For example, the timestamp: '2016-11-13 01:01 AM' occurred twice last weekend. Instead, we should use unix time directly rather than trying to derive unix time from a (timezone-less) timestamp. Note that there are other questionable uses of TimestampValue for internal Impala service stuff, but I want to fix them separately as they are not as important and fixing does add some risk. While I'm here, remove a template TimestampValue constructor that was unused and is confusing. We don't have any end-to-end tests that exercise column lineage, so add a simple custom cluster test that enables lineage and verifes the start and end unix times are within appropriate bounds. The other column lineage graph fields are at least tested via planner tests. Automated regression testing for the specifc daylight savings issue is difficult as we'd have to cross the daylight savings boundary at just the right time during query execution in order to reproduce reliably. But open to ideas. Testing: - loop the new test overnight without any failures. - exhaustive run. Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 --- M be/src/runtime/timestamp-value.h M be/src/service/impala-server.cc M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java A tests/custom_cluster/test_lineage.py 6 files changed, 87 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/5129/1 -- To view, visit http://gerrit.cloudera.org:8080/5129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan Hecht