Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 77BA8200BD3 for ; Tue, 22 Nov 2016 04:07:15 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 768B0160AF9; Tue, 22 Nov 2016 03:07:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7B4F5160B1C for ; Tue, 22 Nov 2016 04:07:14 +0100 (CET) Received: (qmail 2332 invoked by uid 500); 22 Nov 2016 03:07:13 -0000 Mailing-List: contact commits-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list commits@impala.incubator.apache.org Received: (qmail 2322 invoked by uid 99); 22 Nov 2016 03:07:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Nov 2016 03:07:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1A49FC6EF6 for ; Tue, 22 Nov 2016 03:07:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.218 X-Spam-Level: X-Spam-Status: No, score=-6.218 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 1WakqUuuQ6dn for ; Tue, 22 Nov 2016 03:07:11 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id E3D7F5FB75 for ; Tue, 22 Nov 2016 03:07:09 +0000 (UTC) Received: (qmail 99713 invoked by uid 99); 22 Nov 2016 03:05:54 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Nov 2016 03:05:54 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id E1293E38F9; Tue, 22 Nov 2016 03:05:53 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: sailesh@apache.org To: commits@impala.incubator.apache.org Date: Tue, 22 Nov 2016 03:05:54 -0000 Message-Id: <9faa35fa898948f6876602356574edb7@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [2/6] incubator-impala git commit: IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions archived-at: Tue, 22 Nov 2016 03:07:15 -0000 IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions Using TimestampValue (or equivalent string representation) for timestamps that require a point in time doesn't work because the same time can represent multiple point in times. For example, the timestamp: '2016-11-13 01:01 AM' occurred twice last weekend. Instead, we should use unix time directly rather than trying to derive unix time from a (timezone-less) timestamp. Note that there are other questionable uses of TimestampValue for internal Impala service stuff, but I want to fix them separately as they are not as important and fixing does add some risk. While I'm here, remove a template TimestampValue constructor that was unused and is confusing. We don't have any end-to-end tests that exercise column lineage, so add a simple custom cluster test that enables lineage and verifes the start and end unix times are within appropriate bounds. The other column lineage graph fields are at least tested via planner tests. Automated regression testing for the specifc daylight savings issue is difficult as we'd have to cross the daylight savings boundary at just the right time during query execution in order to reproduce reliably. But open to ideas. Testing: - loop the new test overnight without any failures. - exhaustive run. Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9 Reviewed-on: http://gerrit.cloudera.org:8080/5129 Reviewed-by: Dan Hecht Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/035b775a Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/035b775a Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/035b775a Branch: refs/heads/master Commit: 035b775a6dca0609ffec049f765b311d4eda6ebd Parents: de88f0c Author: Dan Hecht Authored: Thu Nov 17 10:06:28 2016 -0800 Committer: Internal Jenkins Committed: Mon Nov 21 22:18:37 2016 +0000 ---------------------------------------------------------------------- be/src/runtime/timestamp-value.h | 18 ----- be/src/service/impala-server.cc | 8 +- common/thrift/ImpalaInternalService.thrift | 3 + .../impala/analysis/ColumnLineageGraph.java | 9 +-- .../org/apache/impala/testutil/TestUtils.java | 1 + tests/custom_cluster/test_lineage.py | 77 ++++++++++++++++++++ 6 files changed, 86 insertions(+), 30 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/be/src/runtime/timestamp-value.h ---------------------------------------------------------------------- diff --git a/be/src/runtime/timestamp-value.h b/be/src/runtime/timestamp-value.h index ab3f081..e722375 100644 --- a/be/src/runtime/timestamp-value.h +++ b/be/src/runtime/timestamp-value.h @@ -84,11 +84,6 @@ class TimestampValue { /// Unix time (seconds since 1970-01-01 UTC by definition) constructors. /// Conversion to local time will be done if /// FLAGS_use_local_tz_for_unix_timestamp_conversions is true. - template - explicit TimestampValue(Number unix_time) { - *this = UnixTimeToPtime(unix_time); - } - TimestampValue(int64_t unix_time, int64_t nanos) { boost::posix_time::ptime temp = UnixTimeToPtime(unix_time); temp += boost::posix_time::nanoseconds(nanos); @@ -171,19 +166,6 @@ class TimestampValue { return true; } - /// Converts to Unix time (seconds since the Unix epoch) in UTC corresponding to this - /// Timestamp instance. - /// Returns false if the conversion failed (utc_time will be undefined), otherwise - /// true. - bool ToUnixTimeInUTC(time_t* utc_time) const { - DCHECK(utc_time != NULL); - if (UNLIKELY(!HasDateAndTime())) return false; - const boost::posix_time::ptime temp(date_, time_); - tm temp_tm = boost::posix_time::to_tm(temp); - *utc_time = mktime(&temp_tm); - return true; - } - /// Converts to Unix time with fractional seconds. /// Returns false if the conversion failed (unix_time will be undefined), otherwise /// true. http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/be/src/service/impala-server.cc ---------------------------------------------------------------------- diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc index c210e28..49e81c8 100644 --- a/be/src/service/impala-server.cc +++ b/be/src/service/impala-server.cc @@ -375,10 +375,9 @@ Status ImpalaServer::LogLineageRecord(const QueryExecState& query_exec_state) { } else { return Status::OK(); } - // Set the query end time in TLineageGraph - time_t utc_end_time; - query_exec_state.end_time().ToUnixTimeInUTC(&utc_end_time); - lineage_graph.__set_ended(utc_end_time); + // Set the query end time in TLineageGraph. Must use UNIX time directly rather than + // e.g. converting from query_exec_state.end_time() (IMPALA-4440). + lineage_graph.__set_ended(UnixMillis() / 1000); string lineage_record; LineageUtil::TLineageToJSON(lineage_graph, &lineage_record); const Status& status = lineage_logger_->AppendEntry(lineage_record); @@ -836,6 +835,7 @@ Status ImpalaServer::ExecuteInternal( void ImpalaServer::PrepareQueryContext(TQueryCtx* query_ctx) { query_ctx->__set_pid(getpid()); query_ctx->__set_now_string(TimestampValue::LocalTime().DebugString()); + query_ctx->__set_start_unix_millis(UnixMillis()); query_ctx->__set_coord_address(MakeNetworkAddress(FLAGS_hostname, FLAGS_be_port)); // Creating a random_generator every time is not free, but http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/common/thrift/ImpalaInternalService.thrift ---------------------------------------------------------------------- diff --git a/common/thrift/ImpalaInternalService.thrift b/common/thrift/ImpalaInternalService.thrift index e510331..25af712 100644 --- a/common/thrift/ImpalaInternalService.thrift +++ b/common/thrift/ImpalaInternalService.thrift @@ -311,6 +311,9 @@ struct TQueryCtx { // Contains only the union of those descriptors referenced by list of fragments destined // for a single host. Optional for frontend tests. 12: optional Descriptors.TDescriptorTable desc_tbl + + // Milliseconds since UNIX epoch at the start of query execution. + 13: required i64 start_unix_millis } // Context to collect information, which is shared among all instances of that plan http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java ---------------------------------------------------------------------- diff --git a/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java b/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java index 131d75a..d84cccc 100644 --- a/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java +++ b/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java @@ -375,14 +375,7 @@ public class ColumnLineageGraph { queryStr_ = queryCtx.request.stmt; } Preconditions.checkNotNull(queryStr_); - SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); - try { - timestamp_ = df.parse(queryCtx.now_string).getTime() / 1000; - } catch (java.text.ParseException e) { - LOG.error("Error parsing timestamp value: " + queryCtx.now_string + - " " + e.getMessage()); - timestamp_ = new Date().getTime() / 1000; - } + timestamp_ = queryCtx.start_unix_millis / 1000; descTbl_ = analyzer.getDescTbl(); user_ = analyzer.getUser().getName(); } http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/fe/src/test/java/org/apache/impala/testutil/TestUtils.java ---------------------------------------------------------------------- diff --git a/fe/src/test/java/org/apache/impala/testutil/TestUtils.java b/fe/src/test/java/org/apache/impala/testutil/TestUtils.java index 5454c7b..fd07971 100644 --- a/fe/src/test/java/org/apache/impala/testutil/TestUtils.java +++ b/fe/src/test/java/org/apache/impala/testutil/TestUtils.java @@ -256,6 +256,7 @@ public class TestUtils { defaultDb, user, new TNetworkAddress("localhost", 0))); SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSSSSS"); queryCtx.setNow_string(formatter.format(Calendar.getInstance().getTime())); + queryCtx.setStart_unix_millis(System.currentTimeMillis()); queryCtx.setPid(1000); return queryCtx; } http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/tests/custom_cluster/test_lineage.py ---------------------------------------------------------------------- diff --git a/tests/custom_cluster/test_lineage.py b/tests/custom_cluster/test_lineage.py new file mode 100644 index 0000000..825e1b7 --- /dev/null +++ b/tests/custom_cluster/test_lineage.py @@ -0,0 +1,77 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Tests for column lineage. +# TODO: add verification for more fields. + +import json +import logging +import os +import pytest +import shutil +import stat +import tempfile +import time + +from tests.common.custom_cluster_test_suite import CustomClusterTestSuite + +LOG = logging.getLogger(__name__) + +class TestLineage(CustomClusterTestSuite): + + lineage_log_dir = tempfile.mkdtemp(); + + query = """ + select count(*) from functional.alltypes + """ + + @classmethod + def setup_class(cls): + super(TestLineage, cls).setup_class() + + @classmethod + def teardown_class(cls): + shutil.rmtree(cls.lineage_log_dir); + + @pytest.mark.execute_serially + @CustomClusterTestSuite.with_args("--lineage_event_log_dir=%s" % lineage_log_dir) + def test_start_end_timestamp(self, vector): + """Test that 'timestamp' and 'endTime' in the lineage graph are populated with valid + UNIX times.""" + LOG.info("lineage_event_log_dir is " + self.lineage_log_dir) + before_time = int(time.time()) + self.execute_query_expect_success(self.client, self.query) + after_time = int(time.time()) + LOG.info("before_time " + str(before_time) + " after_time " + str(after_time)) + + # Stop the cluster in order to flush the lineage log files. + self._stop_impala_cluster() + + for log_filename in os.listdir(self.lineage_log_dir): + log_path = os.path.join(self.lineage_log_dir, log_filename) + # Only the coordinator's log file will be populated. + if os.path.getsize(log_path) > 0: + LOG.info("examining file: " + log_path) + with open(log_path) as log_file: + lineage_json = json.load(log_file) + timestamp = int(lineage_json["timestamp"]) + end_time = int(lineage_json["endTime"]) + assert before_time <= timestamp + assert timestamp <= end_time + assert end_time <= after_time + else: + LOG.info("empty file: " + log_path)