kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danburk...@apache.org
Subject kudu git commit: KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag
Date Mon, 13 Nov 2017 23:03:41 GMT
Repository: kudu
Updated Branches:
  refs/heads/branch-1.4.x 289b7cd6a -> a873b8d15


KUDU-2209. HybridClock doesn't handle changes in STA_NANO flag

Users have occasionally reported spurious crashes due to Kudu thinking
that another node has a time stamp from the future. After some debugging
I realized that the issue is that we currently capture the flag
'STA_NANO' from the kernel only at startup. This flag indicates whether
the kernel's sub-second timestamp is in nanoseconds or microseconds. We
initially assumed this was a static property of the kernel. However it
turns out that this flag can get toggled at runtime by ntp in certain
circumstances. Given this, it was possible for us to interpret a number
of nanoseconds as if it were microseconds, resulting in a timestamp up
to 1000 seconds in the future.

This patch changes the SystemNtp time source to always use the 'adjtime'
call to fetch the clock, and looks at the STA_NANO flag on every such
call rather than only at startup.

I checked the source for the ntp_gettime call that we used to use, and
it turns out it was implemented in terms of the same adjtime() call, so
there should be no performance penalty in this change.

This patch doesn't include tests since it's based on some kernel
functionality. However, I was able to test it as follows:

- wrote a simple program to print the time once a second
- stopped ntp, ran chrony, stopped chrony, and started ntpd
-- this has the side effect of clearing STA_NANO
- waited 10 minutes or so, and eventually ntp reset back to STA_NANO
-- this caused my test program to start printing incorrect timestamps
   as it was interpreting nanosecond values from the kernel as if they
   were microseconds

------------------
Backport notes
-------------------
The clock module was somewhat refactored for Kudu 1.5 and later, so the
cherry-pick was a fairly manual process. Essentially ported over the
same fix rather than trying to cherry-pick and resolve conflicts.

Reviewed-on: http://gerrit.cloudera.org:8080/8450
Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
Tested-by: Alexey Serbin <aserbin@cloudera.com>
(cherry-picked from commit 10f6164b1217e0299bcfedc061d2c57581c389bd)

Change-Id: I4583a4d5ef26da791c9f6cba561a9aa2f22d3dd6
Reviewed-on: http://gerrit.cloudera.org:8080/8530
Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/a873b8d1
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/a873b8d1
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/a873b8d1

Branch: refs/heads/branch-1.4.x
Commit: a873b8d15dc6fc106ee5db2a21c2d064c255af65
Parents: 289b7cd
Author: Todd Lipcon <todd@cloudera.com>
Authored: Mon Nov 13 11:33:31 2017 -0800
Committer: Todd Lipcon <todd@apache.org>
Committed: Mon Nov 13 20:40:52 2017 +0000

----------------------------------------------------------------------
 src/kudu/server/hybrid_clock.cc | 68 ++++++++++++------------------------
 src/kudu/server/hybrid_clock.h  |  6 +---
 2 files changed, 23 insertions(+), 51 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/a873b8d1/src/kudu/server/hybrid_clock.cc
----------------------------------------------------------------------
diff --git a/src/kudu/server/hybrid_clock.cc b/src/kudu/server/hybrid_clock.cc
index 976bdc0..f71a431 100644
--- a/src/kudu/server/hybrid_clock.cc
+++ b/src/kudu/server/hybrid_clock.cc
@@ -70,30 +70,17 @@ namespace server {
 namespace {
 
 #if !defined(__APPLE__)
-// Returns the clock modes and checks if the clock is synchronized.
-Status GetClockModes(timex* timex) {
-  // this makes ntp_adjtime a read-only call
-  timex->modes = 0;
-  int rc = ntp_adjtime(timex);
-  if (PREDICT_FALSE(rc == TIME_ERROR)) {
-    return Status::ServiceUnavailable(
-        Substitute("Error reading clock. Clock considered unsynchronized. Return code: $0",
rc));
-  }
-  // TODO what to do about leap seconds? see KUDU-146
-  if (PREDICT_FALSE(rc != TIME_OK)) {
-    LOG(ERROR) << Substitute("TODO Server undergoing leap second. Return code: $0",
rc);
-  }
-  return Status::OK();
-}
 
 // Returns the current time/max error and checks if the clock is synchronized.
-kudu::Status GetClockTime(ntptimeval* timeval) {
-  int rc = ntp_gettime(timeval);
+Status CallAdjTime(timex* tx) {
+  // Set mode to 0 to query the current time.
+  tx->modes = 0;
+  int rc = ntp_adjtime(tx);
   switch (rc) {
     case TIME_OK:
       return Status::OK();
     case -1: // generic error
-      return Status::ServiceUnavailable("Error reading clock. ntp_gettime() failed",
+      return Status::ServiceUnavailable("Error reading clock. ntp_adjtime() failed",
                                         ErrnoToString(errno));
     case TIME_ERROR:
       return Status::ServiceUnavailable("Error reading clock. Clock considered unsynchronized");
@@ -128,16 +115,13 @@ const int HybridClock::kBitsToShift = 12;
 // This mask gives us back the logical bits.
 const uint64_t HybridClock::kLogicalBitMask = (1 << kBitsToShift) - 1;
 
-const uint64_t HybridClock::kNanosPerSec = 1000000;
+const uint64_t HybridClock::kMicrosPerSec = 1000000;
 
 const double HybridClock::kAdjtimexScalingFactor = 65536;
 
 HybridClock::HybridClock()
     : mock_clock_time_usec_(0),
       mock_clock_max_error_usec_(0),
-#if !defined(__APPLE__)
-      divisor_(1),
-#endif
       tolerance_adjustment_(1),
       next_timestamp_(0),
       state_(kNotInitialized) {
@@ -153,30 +137,16 @@ Status HybridClock::Init() {
   LOG(WARNING) << "HybridClock initialized in local mode (OS X only). "
                << "Not suitable for distributed clusters.";
 #else
-  // Read the current time. This will return an error if the clock is not synchronized.
-  uint64_t now_usec;
-  uint64_t error_usec;
-  RETURN_NOT_OK(WalltimeWithError(&now_usec, &error_usec));
-
   timex timex;
-  RETURN_NOT_OK(GetClockModes(&timex));
-  // read whether the STA_NANO bit is set to know whether we'll get back nanos
-  // or micros in timeval.time.tv_usec. See:
-  // http://stackoverflow.com/questions/16063408/does-ntp-gettime-actually-return-nanosecond-precision
-  // set the timeval.time.tv_usec divisor so that we always get micros
-  if (timex.status & STA_NANO) {
-    divisor_ = 1000;
-  } else {
-    divisor_ = 1;
-  }
+  RETURN_NOT_OK(CallAdjTime(&timex));
 
   // Calculate the sleep skew adjustment according to the max tolerance of the clock.
   // Tolerance comes in parts per million but needs to be applied a scaling factor.
   tolerance_adjustment_ = (1 + ((timex.tolerance / kAdjtimexScalingFactor) / 1000000.0));
 
-  LOG(INFO) << "HybridClock initialized. Resolution in nanos?: " << (divisor_
== 1000)
+  LOG(INFO) << "HybridClock initialized. "
             << " Wait times tolerance adjustment: " << tolerance_adjustment_
-            << " Current error: " << error_usec;
+            << " Current error: " << timex.maxerror << "us";
 #endif // defined(__APPLE__)
 
   state_ = kInitialized;
@@ -398,13 +368,19 @@ kudu::Status HybridClock::WalltimeWithError(uint64_t* now_usec, uint64_t*
error_
 #if defined(__APPLE__)
     *now_usec = GetCurrentTimeMicros();
     *error_usec = 0;
-  }
 #else
     // Read the time. This will return an error if the clock is not synchronized.
-    ntptimeval timeval;
-    RETURN_NOT_OK(GetClockTime(&timeval));
-    *now_usec = timeval.time.tv_sec * kNanosPerSec + timeval.time.tv_usec / divisor_;
-    *error_usec = timeval.maxerror;
+    timex tx;
+    RETURN_NOT_OK(CallAdjTime(&tx));
+
+    if (tx.status & STA_NANO) {
+      tx.time.tv_usec /= 1000;
+    }
+    DCHECK_LE(tx.time.tv_usec, 1e6);
+
+    *now_usec = tx.time.tv_sec * kMicrosPerSec + tx.time.tv_usec;
+    *error_usec = tx.maxerror;
+#endif // defined(__APPLE__)
   }
 
   // If the clock is synchronized but has max_error beyond max_clock_sync_error_usec
@@ -413,8 +389,8 @@ kudu::Status HybridClock::WalltimeWithError(uint64_t* now_usec, uint64_t*
error_
     return Status::ServiceUnavailable(Substitute("Error: Clock synchronized but error was"
         "too high ($0 us).", *error_usec));
   }
-#endif // defined(__APPLE__)
-  return kudu::Status::OK();
+
+return kudu::Status::OK();
 }
 
 void HybridClock::SetMockClockWallTimeForTests(uint64_t now_usec) {

http://git-wip-us.apache.org/repos/asf/kudu/blob/a873b8d1/src/kudu/server/hybrid_clock.h
----------------------------------------------------------------------
diff --git a/src/kudu/server/hybrid_clock.h b/src/kudu/server/hybrid_clock.h
index c2e1e62..cf42d40 100644
--- a/src/kudu/server/hybrid_clock.h
+++ b/src/kudu/server/hybrid_clock.h
@@ -184,10 +184,6 @@ class HybridClock : public Clock {
   // For testing purposes only.
   uint64_t mock_clock_max_error_usec_;
 
-#if !defined(__APPLE__)
-  uint64_t divisor_;
-#endif
-
   double tolerance_adjustment_;
 
   mutable simple_spinlock lock_;
@@ -203,7 +199,7 @@ class HybridClock : public Clock {
   // Mask to extract the pure logical bits.
   static const uint64_t kLogicalBitMask;
 
-  static const uint64_t kNanosPerSec;
+  static const uint64_t kMicrosPerSec;
 
   // The scaling factor used to obtain ppms. From the adjtimex source:
   // "scale factor used by adjtimex freq param.  1 ppm = 65536"


Mime
View raw message