impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Jacobs ...@cloudera.com>
Subject Re: Kudu start error with low ntpdate "maximum error"
Date Wed, 16 Nov 2016 18:20:47 GMT
I asked on the Kudu slack channel, they have seen issues where freshly
provisioned ec2 nodes take some time for ntp to quiesce, but they
didn't have a sense of how long that might take. If you checked
ntptime after the job failed, it may be that ntp had enough time. We
can probably consider bumping up the allowable error.

On Wed, Nov 16, 2016 at 9:24 AM, Jim Apple <jbapple@cloudera.com> wrote:
> This is the second time I have seen it, but it doesn't happen every
> time. It could very well be a difference on ec2; already I've seen
> some bugs due to my ec2 instances being Etc/UTC timezone while most
> Impala developers work in America/Los_Angeles.
>
> On Wed, Nov 16, 2016 at 9:10 AM, Matthew Jacobs <mj@cloudera.com> wrote:
>> No problem. If this happens again we should ask the Kudu developers. I
>> haven't seen this before - I wonder if it could be some weirdness on
>> ec2...
>>
>> Thanks
>>
>> On Wed, Nov 16, 2016 at 9:01 AM, Jim Apple <jbapple@cloudera.com> wrote:
>>> Thank you for your help!
>>>
>>> This was on an AWS machine that has expired, but I can see from the
>>> logs that "IMPALA_KUDU_VERSION=88b023" and
>>> "KUDU_JAVA_VERSION=1.0.0-SNAPSHOT" and "Downloading
>>> kudu-python-0.3.0.tar.gz" and "URL
>>> https://native-toolchain.s3.amazonaws.com/build/264-e9d44349ba/kudu/88b023-gcc-4.9.2/kudu-88b023-gcc-4.9.2-ec2-package-ubuntu-14-04.tar.gz".
>>> I'll add "ps aux | grep kudu" to the logging this machine does on
>>> error, so we'll have it next time, but I did "ps -Afly" on exit and
>>> there were no kudu processes running, it looks like.
>>>
>>> On Wed, Nov 16, 2016 at 8:52 AM, Matthew Jacobs <mj@cloudera.com> wrote:
>>>> Can you check which version of the client you're building against
>>>> (KUDU_VERSION env var) vs what Kudu version is running (ps aux | grep
>>>> kudu
>>>>
>>>> On Wed, Nov 16, 2016 at 8:48 AM, Jim Apple <jbapple@cloudera.com> wrote:
>>>>> Yes.
>>>>>
>>>>> On Wed, Nov 16, 2016 at 7:45 AM, Matthew Jacobs <mj@cloudera.com>
wrote:
>>>>>> Do you have NTP installed?
>>>>>>
>>>>>> On Tue, Nov 15, 2016 at 9:22 PM, Jim Apple <jbapple@cloudera.com>
wrote:
>>>>>>> I have a machine where Kudu failed to start:
>>>>>>>
>>>>>>> F1116 05:02:00.173629 71098 tablet_server_main.cc:64] Check failed:
>>>>>>> _s.ok() Bad status: Service unavailable: Cannot initialize clock:
>>>>>>> Error reading clock. Clock considered unsynchronized
>>>>>>>
>>>>>>> https://kudu.apache.org/docs/troubleshooting.html says:
>>>>>>>
>>>>>>> "For the master and tablet server daemons, the server’s clock
must be
>>>>>>> synchronized using NTP. In addition, the maximum clock error
(not to
>>>>>>> be mistaken with the estimated error) be below a configurable
>>>>>>> threshold. The default value is 10 seconds, but it can be set
with the
>>>>>>> flag --max_clock_sync_error_usec."
>>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>> "If NTP is installed the user can monitor the synchronization
status
>>>>>>> by running ntptime. The relevant value is what is reported for
maximum
>>>>>>> error."
>>>>>>>
>>>>>>> ntptime reports:
>>>>>>>
>>>>>>> ntp_gettime() returns code 0 (OK)
>>>>>>>   time dbd66a6a.59bca948  Wed, Nov 16 2016  5:17:30.350, (.350535824),
>>>>>>>   maximum error 197431 us, estimated error 71015 us, TAI offset
0
>>>>>>> ntp_adjtime() returns code 0 (OK)
>>>>>>>   modes 0x0 (),
>>>>>>>   offset 74989.459 us, frequency 19.950 ppm, interval 1 s,
>>>>>>>   maximum error 197431 us, estimated error 71015 us,
>>>>>>>   status 0x2001 (PLL,NANO),
>>>>>>>   time constant 6, precision 0.001 us, tolerance 500 ppm,
>>>>>>>
>>>>>>> So it looks like this error is anticipated, but the expected
>>>>>>> conditions for it to occur are absent. Any ideas what could be
going
>>>>>>> on here? This is with a recent checkout of Impala master.

Mime
View raw message