kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franco Venturi <fvent...@comcast.net>
Subject Error message: 'Tried to update clock beyond the max. error.'
Date Wed, 01 Nov 2017 02:12:55 GMT

A few days ago at work our Kudu servers started having fatal errors and shutting down with
the following error message: 

Couldn't get the current time: Clock unsynchronized. Status: Service unavailable: Error: Clock
synchronized but error wastoo high (10000016 us). 

After some research in the community forums, I found a post by Todd that pointed to this JIRA
issue: https://issues.apache.org/jira/browse/KUDU-2079 

I then checked our ntpd configuration and sure enough we had the '-x' option in the daemon
command, so I went ahead, removed that option, restarted ntpd, and a few minutes later I restarted
all the Kudu processes (one master and three tablet servers). 
A few minutes later a couple of those Kudu processes were down again, this time with this
new time sync related error message: 

Tried to update clock beyond the max. error. 

To try to address this new error, I brought down all the Kudu processes, stopped ntpd, resync'd
the time on all the servers with ntpdate, brought ntpd back up, waited a bit, and restarted
Kudu (master and tablet servers). A few minutes or less later a couple of them were down again
with the same 'Tried to update clock beyond the max. error.' 

I eventually ended up doubling the parameter 'max_clock_sync_error_usec' to 20,000,000 (20
seconds) and everything stayed up (and is still up). 

Looking at the source code in git, I found the relevant section here (source file https://github.com/apache/kudu/blob/master/src/kudu/clock/hybrid_clock.cc):

// we won't update our clock if to_update is more than 'max_clock_sync_error_usec' 
// into the future as it might have been corrupted or originated from an out-of-sync 
// server. 
if ((to_update_physical - now_physical) > FLAGS_max_clock_sync_error_usec) { 
return Status::InvalidArgument("Tried to update clock beyond the max. error."); 

If I understand this code correctly, it is complaining because for some reason Kudu is trying
to update its clock by more than 10 seconds - however I ran ntptime and several ntpq queries,
and I don't see the time between the servers being off by that much (or even by say half a
second, since they are all synchronized with a stratum 3 NTP server). 

Has anyone in this group seen anything similar or does anyone have a better understanding
of what this message means and what could be causing it? 


View raw message