impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sail...@apache.org
Subject [2/6] incubator-impala git commit: IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions
Date Tue, 22 Nov 2016 03:05:54 GMT
IMPALA-4440: lineage timestamps can go backwards across daylight savings transitions

Using TimestampValue (or equivalent string representation) for
timestamps that require a point in time doesn't work because the same
time can represent multiple point in times.  For example, the timestamp:
'2016-11-13 01:01 AM' occurred twice last weekend.

Instead, we should use unix time directly rather than trying to derive
unix time from a (timezone-less) timestamp.

Note that there are other questionable uses of TimestampValue for
internal Impala service stuff, but I want to fix them separately as they
are not as important and fixing does add some risk.

While I'm here, remove a template TimestampValue constructor that was
unused and is confusing.

We don't have any end-to-end tests that exercise column lineage, so add
a simple custom cluster test that enables lineage and verifes the start
and end unix times are within appropriate bounds.  The other column
lineage graph fields are at least tested via planner tests.

Automated regression testing for the specifc daylight savings issue is
difficult as we'd have to cross the daylight savings boundary at just
the right time during query execution in order to reproduce
reliably. But open to ideas.

Testing:
- loop the new test overnight without any failures.
- exhaustive run.

Change-Id: I34e435fc3511e65bc62906205cb558f2c116a8a9
Reviewed-on: http://gerrit.cloudera.org:8080/5129
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/035b775a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/035b775a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/035b775a

Branch: refs/heads/master
Commit: 035b775a6dca0609ffec049f765b311d4eda6ebd
Parents: de88f0c
Author: Dan Hecht <dhecht@cloudera.com>
Authored: Thu Nov 17 10:06:28 2016 -0800
Committer: Internal Jenkins <cloudera-hudson@gerrit.cloudera.org>
Committed: Mon Nov 21 22:18:37 2016 +0000

----------------------------------------------------------------------
 be/src/runtime/timestamp-value.h                | 18 -----
 be/src/service/impala-server.cc                 |  8 +-
 common/thrift/ImpalaInternalService.thrift      |  3 +
 .../impala/analysis/ColumnLineageGraph.java     |  9 +--
 .../org/apache/impala/testutil/TestUtils.java   |  1 +
 tests/custom_cluster/test_lineage.py            | 77 ++++++++++++++++++++
 6 files changed, 86 insertions(+), 30 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/be/src/runtime/timestamp-value.h
----------------------------------------------------------------------
diff --git a/be/src/runtime/timestamp-value.h b/be/src/runtime/timestamp-value.h
index ab3f081..e722375 100644
--- a/be/src/runtime/timestamp-value.h
+++ b/be/src/runtime/timestamp-value.h
@@ -84,11 +84,6 @@ class TimestampValue {
   /// Unix time (seconds since 1970-01-01 UTC by definition) constructors.
   /// Conversion to local time will be done if
   /// FLAGS_use_local_tz_for_unix_timestamp_conversions is true.
-  template <typename Number>
-  explicit TimestampValue(Number unix_time) {
-    *this = UnixTimeToPtime(unix_time);
-  }
-
   TimestampValue(int64_t unix_time, int64_t nanos) {
     boost::posix_time::ptime temp = UnixTimeToPtime(unix_time);
     temp += boost::posix_time::nanoseconds(nanos);
@@ -171,19 +166,6 @@ class TimestampValue {
     return true;
   }
 
-  /// Converts to Unix time (seconds since the Unix epoch) in UTC corresponding to this
-  /// Timestamp instance.
-  /// Returns false if the conversion failed (utc_time will be undefined), otherwise
-  /// true.
-  bool ToUnixTimeInUTC(time_t* utc_time) const {
-    DCHECK(utc_time != NULL);
-    if (UNLIKELY(!HasDateAndTime())) return false;
-    const boost::posix_time::ptime temp(date_, time_);
-    tm temp_tm = boost::posix_time::to_tm(temp);
-    *utc_time = mktime(&temp_tm);
-    return true;
-  }
-
   /// Converts to Unix time with fractional seconds.
   /// Returns false if the conversion failed (unix_time will be undefined), otherwise
   /// true.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/be/src/service/impala-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index c210e28..49e81c8 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -375,10 +375,9 @@ Status ImpalaServer::LogLineageRecord(const QueryExecState& query_exec_state)
{
   } else {
     return Status::OK();
   }
-  // Set the query end time in TLineageGraph
-  time_t utc_end_time;
-  query_exec_state.end_time().ToUnixTimeInUTC(&utc_end_time);
-  lineage_graph.__set_ended(utc_end_time);
+  // Set the query end time in TLineageGraph. Must use UNIX time directly rather than
+  // e.g. converting from query_exec_state.end_time() (IMPALA-4440).
+  lineage_graph.__set_ended(UnixMillis() / 1000);
   string lineage_record;
   LineageUtil::TLineageToJSON(lineage_graph, &lineage_record);
   const Status& status = lineage_logger_->AppendEntry(lineage_record);
@@ -836,6 +835,7 @@ Status ImpalaServer::ExecuteInternal(
 void ImpalaServer::PrepareQueryContext(TQueryCtx* query_ctx) {
   query_ctx->__set_pid(getpid());
   query_ctx->__set_now_string(TimestampValue::LocalTime().DebugString());
+  query_ctx->__set_start_unix_millis(UnixMillis());
   query_ctx->__set_coord_address(MakeNetworkAddress(FLAGS_hostname, FLAGS_be_port));
 
   // Creating a random_generator every time is not free, but

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/common/thrift/ImpalaInternalService.thrift
----------------------------------------------------------------------
diff --git a/common/thrift/ImpalaInternalService.thrift b/common/thrift/ImpalaInternalService.thrift
index e510331..25af712 100644
--- a/common/thrift/ImpalaInternalService.thrift
+++ b/common/thrift/ImpalaInternalService.thrift
@@ -311,6 +311,9 @@ struct TQueryCtx {
   // Contains only the union of those descriptors referenced by list of fragments destined
   // for a single host. Optional for frontend tests.
   12: optional Descriptors.TDescriptorTable desc_tbl
+
+  // Milliseconds since UNIX epoch at the start of query execution.
+  13: required i64 start_unix_millis
 }
 
 // Context to collect information, which is shared among all instances of that plan

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java b/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
index 131d75a..d84cccc 100644
--- a/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
+++ b/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
@@ -375,14 +375,7 @@ public class ColumnLineageGraph {
       queryStr_ = queryCtx.request.stmt;
     }
     Preconditions.checkNotNull(queryStr_);
-    SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
-    try {
-      timestamp_ = df.parse(queryCtx.now_string).getTime() / 1000;
-    } catch (java.text.ParseException e) {
-      LOG.error("Error parsing timestamp value: " + queryCtx.now_string +
-          " " + e.getMessage());
-      timestamp_ = new Date().getTime() / 1000;
-    }
+    timestamp_ = queryCtx.start_unix_millis / 1000;
     descTbl_ = analyzer.getDescTbl();
     user_ = analyzer.getUser().getName();
   }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/testutil/TestUtils.java b/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
index 5454c7b..fd07971 100644
--- a/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
+++ b/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
@@ -256,6 +256,7 @@ public class TestUtils {
         defaultDb, user, new TNetworkAddress("localhost", 0)));
     SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSSSSSSSS");
     queryCtx.setNow_string(formatter.format(Calendar.getInstance().getTime()));
+    queryCtx.setStart_unix_millis(System.currentTimeMillis());
     queryCtx.setPid(1000);
     return queryCtx;
   }

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/035b775a/tests/custom_cluster/test_lineage.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_lineage.py b/tests/custom_cluster/test_lineage.py
new file mode 100644
index 0000000..825e1b7
--- /dev/null
+++ b/tests/custom_cluster/test_lineage.py
@@ -0,0 +1,77 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# Tests for column lineage.
+# TODO: add verification for more fields.
+
+import json
+import logging
+import os
+import pytest
+import shutil
+import stat
+import tempfile
+import time
+
+from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
+
+LOG = logging.getLogger(__name__)
+
+class TestLineage(CustomClusterTestSuite):
+
+  lineage_log_dir = tempfile.mkdtemp();
+
+  query = """
+      select count(*) from functional.alltypes
+      """
+
+  @classmethod
+  def setup_class(cls):
+    super(TestLineage, cls).setup_class()
+
+  @classmethod
+  def teardown_class(cls):
+    shutil.rmtree(cls.lineage_log_dir);
+
+  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args("--lineage_event_log_dir=%s" % lineage_log_dir)
+  def test_start_end_timestamp(self, vector):
+    """Test that 'timestamp' and 'endTime' in the lineage graph are populated with valid
+       UNIX times."""
+    LOG.info("lineage_event_log_dir is " + self.lineage_log_dir)
+    before_time = int(time.time())
+    self.execute_query_expect_success(self.client, self.query)
+    after_time = int(time.time())
+    LOG.info("before_time " + str(before_time) + " after_time " + str(after_time))
+
+    # Stop the cluster in order to flush the lineage log files.
+    self._stop_impala_cluster()
+
+    for log_filename in os.listdir(self.lineage_log_dir):
+      log_path = os.path.join(self.lineage_log_dir, log_filename)
+      # Only the coordinator's log file will be populated.
+      if os.path.getsize(log_path) > 0:
+        LOG.info("examining file: " + log_path)
+        with open(log_path) as log_file:
+          lineage_json = json.load(log_file)
+          timestamp = int(lineage_json["timestamp"])
+          end_time = int(lineage_json["endTime"])
+          assert before_time <= timestamp
+          assert timestamp <= end_time
+          assert end_time <= after_time
+      else:
+        LOG.info("empty file: " + log_path)


Mime
View raw message