kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee King <yuyunliu...@gmail.com>
Subject Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.
Date Wed, 15 Nov 2017 01:15:15 GMT
We disable THP the day before yesterday, the cluster start work well soon,
I obsever our spark task online for a long time, it no delay now ,and
execute task' time is fluctuation; we also do some test using ycsb, the
test result is same as we deploy a new cluster, even better. I think TPH is
key primary casuse for the cluster unstable. We ' ll obsever more time, and
if sonething intetesting happen, I will replay here. Thanks @Todd .

    King Lee


Todd Lipcon <todd@cloudera.com>于2017年11月14日 周二上午1:37写道:

> I caught up with King Lee on Slack over the weekend, but wanted to reply
> here so that if anyone else in the community hits similar issues they might
> check on the same things.
>
> - I asked for a dump of /metrics and /rpcz from the TS web server.
>
> From this I could see that a lot of the consensus requests were stuck a
> long time waiting on various locks.
>
> - I asked for 10 'pstack's separated by 10 seconds using the following
> command:
> for x in $(seq 1 10) ; do pstack $(pgrep kudu-tserver) >
> /tmp/stacks.$x.txt ; sleep 10 ; done
>
> In the stacks I found some interesting stuff:
>
> 1) Many threads in the process of leader elections
> - King Lee has an SSD available on his tablet server boxes, so I gave him
> a script to migrate the consensus-meta/ and tablet-meta/ directories to the
> SSD and replace them with symlinks on the original drive. This is basically
> a manual workaround for https://issues.apache.org/jira/browse/KUDU-1489
>
> 2) A lot of threads blocked in DNS lookups
> - He's using /etc/hosts only and not actual DNS.
> - The stack looks like:
>
> Thread 770 (Thread 0x7fa8ecd0d700 (LWP 7365)):
> #0  0x00007fabc504e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fabc50495d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007fabc50494a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007fabbcfe2e07 in _nss_files_gethostbyname2_r () from
> /lib64/libnss_files.so.2
> #4  0x00007fabc3fed915 in gethostbyname2_r@@GLIBC_2.2.5 () from
> /lib64/libc.so.6
> #5  0x00007fabc3fbad62 in gaih_inet () from /lib64/libc.so.6
> #6  0x00007fabc3fbcfbf in getaddrinfo () from /lib64/libc.so.6
> #7  0x0000000001b3bd06 in
> kudu::HostPort::ResolveAddresses(std::vector<kudu::Sockaddr,
> std::allocator<kudu::Sockaddr> >*) const ()
>
> It seems like at least this version of glibc (from el6) has a lock around
> the '/etc/hosts'-based implementation of gethostbyname. It still struck me
> as odd that looking up a host in /etc/hosts would be slow, but I also found
> this stack:
>
> Thread 496 (Thread 0x7fa65d7bb700 (LWP 28798)):
> #0  0x00007fabc3fcf59a in mmap64 () from /lib64/libc.so.6
> #1  0x00007fabc3f5015c in _IO_file_doallocate_internal () from
> /lib64/libc.so.6
> #2  0x00007fabc3f5d61c in _IO_doallocbuf_internal () from /lib64/libc.so.6
> #3  0x00007fabc3f5bbd4 in _IO_new_file_underflow () from /lib64/libc.so.6
> #4  0x00007fabc3f5d65e in _IO_default_uflow_internal () from
> /lib64/libc.so.6
> #5  0x00007fabc3f51d3a in _IO_getline_info_internal () from
> /lib64/libc.so.6
> #6  0x00007fabc3f5ab71 in fgets_unlocked () from /lib64/libc.so.6
> #7  0x00007fabbcfe1ab1 in internal_getent () from /lib64/libnss_files.so.2
> #8  0x00007fabbcfe2e96 in _nss_files_gethostbyname2_r () from
> /lib64/libnss_files.so.2
> #9  0x00007fabc3fed915 in gethostbyname2_r@@GLIBC_2.2.5 () from
> /lib64/libc.so.6
> #10 0x00007fabc3fbad62 in gaih_inet () from /lib64/libc.so.6
> #11 0x00007fabc3fbcfbf in getaddrinfo () from /lib64/libc.so.6
>
> I dug into the source for el6's glibc and found that, indeed, it uses
> mmap() to read /etc/hosts on every host resolution.
>
> This led me to question why mmap was so slow. The most common reason I've
> seen for that is when transparent hugepages are enabled on el6. It turns
> out King's system does have THP enabled, so I asked him to disable it. See
> https://issues.apache.org/jira/browse/KUDU-1447
>
> - The last thing I noticed were a bunch of threads stuck in this stack:
>
> Thread 54 (Thread 0x7fabb95e5700 (LWP 29901)):
> #0  0x00007fabc504e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fabc50495d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007fabc50494a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000001b3669f in kudu::Mutex::Acquire() ()
> #4  0x0000000001b17193 in kudu::KernelStackWatchdog::Unregister() ()
> #5  0x0000000001b1736a in void
> kudu::threadlocal::internal::Destroy<kudu::KernelStackWatchdog::TLS>(void*)
> ()
> #6  0x0000000001b6c316 in
> kudu::threadlocal::internal::InvokeDestructors(void*) ()
> #7  0x00007fabc5047b39 in start_thread () from /lib64/libpthread.so.0
> #8  0x00007fabc3fd2bcd in clone () from /lib64/libc.so.6
>
> I filed KUDU-2215 for this one. This particular stuck thread issue
> wouldn't cause problems in Kudu 1.5, but in 1.4 and earlier the slow
> shutdown of a thread could cause consensus to get "stuck" as well in a
> stack like this:
> Thread 2137 (Thread 0x7fa4e4c1c700 (LWP 1564)):
> #0  0x00007fabc504ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x0000000001ad9cdd in
> kudu::ConditionVariable::TimedWait(kudu::MonoDelta const&) const ()
> #2  0x0000000001b63d0c in kudu::ThreadJoiner::Join() ()
> #3  0x0000000001b55b4c in kudu::ResettableHeartbeater::Stop() ()
> #4  0x0000000000a65b42 in kudu::consensus::Peer::Close() ()
> #5  0x0000000000a819b9 in kudu::consensus::PeerManager::Close() ()
> #6  0x0000000000a45dde in
> kudu::consensus::RaftConsensus::BecomeReplicaUnlocked() ()
> #7  0x0000000000a51063 in
> kudu::consensus::RaftConsensus::HandleTermAdvanceUnlocked(long,
> kudu::consensus::ReplicaState::FlushToDisk) ()
> #8  0x0000000000a5145f in
> kudu::consensus::RaftConsensus::NotifyTermChange(long) ()
> #9  0x0000000000a6ceac in
> kudu::consensus::PeerMessageQueue::NotifyObserversOfTermChangeTask(long) ()
>
> which I also found in his stacks fairly often.
>
> To workaround this I suggested disabling the kernel_stack_watchdog using a
> hidden configuration. I'll look at a fix for this that we can backport to
> earlier versions as well.
>
> King -- please report back in a couple days whether the suggestions we
> talked about help resolve the problems on your cluster.
>
> Thanks
> -Todd
>
>
>
>
>
> On Thu, Nov 9, 2017 at 11:27 PM, Lee King <yuyunliuhen@gmail.com> wrote:
>
>> Hi, Todd
>>     Our kudu cluster 's error/warning log just like the
>> https://issues.apache.org/jira/browse/KUDU-1078, and the issues's status
>> is reopen, I have upload log for analysis the issues, If you want to more
>> detail, just tell me 😄。
>> log files:
>> https://drive.google.com/open?id=1_1l2xpT3-NmumgI_sIdxch-6BocXqTCt
>>
>> https://drive.google.com/open?id=0B4-NyGFtYNboN3NYNW1pVWQwcFVLa083VkRIUTZHRk85WHY4
>>
>>
>>
>>
>>
>>
>> 2017-11-04 13:15 GMT+08:00 Todd Lipcon <todd@cloudera.com>:
>>
>>> One thing you might try is to update the consensus rpc timeout to 30
>>> seconds instead of 1. We changed the default in later versions.
>>>
>>> I'd also recommend updating up 1.4 or 1.5 for other related fixes to
>>> consensus stability. I think I recall you were on 1.3 still?
>>>
>>> Todd
>>>
>>>
>>> On Nov 3, 2017 7:47 PM, "Lee King" <yuyunliuhen@gmail.com> wrote:
>>>
>>> Hi,
>>>     Our kudu cluster have ran well a long time,  but write became slowly
>>> recently,client also come out rpc timeout. I check the warning and find
>>> vast error look this:
>>> W1104 10:25:16.833736 10271 consensus_peers.cc:365] T
>>> 149ffa58ac274c9ba8385ccfdc01ea14 P 59c768eb799243678ee7fa3f83801316 -> Peer
>>> 1c67a7e7ff8f4de494469766641fccd1 (cloud-sk-ds-08:7050): Couldn't send
>>> request to peer 1c67a7e7ff8f4de494469766641fccd1 for tablet
>>> 149ffa58ac274c9ba8385ccfdc01ea14. Status: Timed out: UpdateConsensus RPC to
>>> 10.6.60.9:7050 timed out after 1.000s (SENT). Retrying in the next
>>> heartbeat period. Already tried 5 times.
>>>     I change the
>>> configure rpc_service_queue_length=400,rpc_num_service_threads=40, but it
>>> takes no effect.
>>>     Our cluster include 5 master , 10 ts. 3800G data, 800 tablet per ts.
>>> I check one of the ts machine's memory, 14G left(128 In all), thread
>>> 4739(max 32000), openfile 28000(max 65536), cpu disk utilization ratio
>>> about 30%(32 core), disk util  less than 30%.
>>>     Any suggestion for this? Thanks!
>>>
>>>
>>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message