kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee King <yuyunliu...@gmail.com>
Subject Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.
Date Fri, 10 Nov 2017 07:27:41 GMT
Hi, Todd
    Our kudu cluster 's error/warning log just like the
https://issues.apache.org/jira/browse/KUDU-1078, and the issues's status is
reopen, I have upload log for analysis the issues, If you want to more
detail, just tell me 😄。
log files:
https://drive.google.com/open?id=1_1l2xpT3-NmumgI_sIdxch-6BocXqTCt
https://drive.google.com/open?id=0B4-NyGFtYNboN3NYNW1pVWQwcFVLa083VkRIUTZHRk85WHY4






2017-11-04 13:15 GMT+08:00 Todd Lipcon <todd@cloudera.com>:

> One thing you might try is to update the consensus rpc timeout to 30
> seconds instead of 1. We changed the default in later versions.
>
> I'd also recommend updating up 1.4 or 1.5 for other related fixes to
> consensus stability. I think I recall you were on 1.3 still?
>
> Todd
>
>
> On Nov 3, 2017 7:47 PM, "Lee King" <yuyunliuhen@gmail.com> wrote:
>
> Hi,
>     Our kudu cluster have ran well a long time,  but write became slowly
> recently,client also come out rpc timeout. I check the warning and find
> vast error look this:
> W1104 10:25:16.833736 10271 consensus_peers.cc:365] T
> 149ffa58ac274c9ba8385ccfdc01ea14 P 59c768eb799243678ee7fa3f83801316 ->
> Peer 1c67a7e7ff8f4de494469766641fccd1 (cloud-sk-ds-08:7050): Couldn't
> send request to peer 1c67a7e7ff8f4de494469766641fccd1 for tablet
> 149ffa58ac274c9ba8385ccfdc01ea14. Status: Timed out: UpdateConsensus RPC
> to 10.6.60.9:7050 timed out after 1.000s (SENT). Retrying in the next
> heartbeat period. Already tried 5 times.
>     I change the configure rpc_service_queue_le
> ngth=400,rpc_num_service_threads=40, but it takes no effect.
>     Our cluster include 5 master , 10 ts. 3800G data, 800 tablet per ts. I
> check one of the ts machine's memory, 14G left(128 In all), thread 4739(max
> 32000), openfile 28000(max 65536), cpu disk utilization ratio about
> 30%(32 core), disk util  less than 30%.
>     Any suggestion for this? Thanks!
>
>
>

Mime
View raw message