kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee King <yuyunliu...@gmail.com>
Subject Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.
Date Mon, 06 Nov 2017 03:15:17 GMT
Hi, Todd
    I have change the consensus_rpc_timeout_ms from 1s to 30s, but It looks
like not fix this problem also. give some error info from file
kudu-tserver.ERROR .

E1106 11:10:46.077805 30426 consensus_queue.cc:428] T
f9eb9a8cd4cd4d26a9340928fb4e9327 P e817589db79348ad8722a697f1671720
[LEADER]: Error trying to read ahead of the log while preparing peer
request: Incomplete: Op with index 122 is ahead of the local log (next
sequential op: 122). Destination peer: Peer:
9229935dbf8f4f8f8f63b279fb0796ea, Is new: false, Last received: 149.122,
Next index: 123, Last known committed idx: 122, Last exchange result:
ERROR, Needs tablet copy: false


2017-11-04 13:15 GMT+08:00 Todd Lipcon <todd@cloudera.com>:

> One thing you might try is to update the consensus rpc timeout to 30
> seconds instead of 1. We changed the default in later versions.
>
> I'd also recommend updating up 1.4 or 1.5 for other related fixes to
> consensus stability. I think I recall you were on 1.3 still?
>
> Todd
>
>
> On Nov 3, 2017 7:47 PM, "Lee King" <yuyunliuhen@gmail.com> wrote:
>
> Hi,
>     Our kudu cluster have ran well a long time,  but write became slowly
> recently,client also come out rpc timeout. I check the warning and find
> vast error look this:
> W1104 10:25:16.833736 10271 consensus_peers.cc:365] T
> 149ffa58ac274c9ba8385ccfdc01ea14 P 59c768eb799243678ee7fa3f83801316 ->
> Peer 1c67a7e7ff8f4de494469766641fccd1 (cloud-sk-ds-08:7050): Couldn't
> send request to peer 1c67a7e7ff8f4de494469766641fccd1 for tablet
> 149ffa58ac274c9ba8385ccfdc01ea14. Status: Timed out: UpdateConsensus RPC
> to 10.6.60.9:7050 timed out after 1.000s (SENT). Retrying in the next
> heartbeat period. Already tried 5 times.
>     I change the configure rpc_service_queue_le
> ngth=400,rpc_num_service_threads=40, but it takes no effect.
>     Our cluster include 5 master , 10 ts. 3800G data, 800 tablet per ts. I
> check one of the ts machine's memory, 14G left(128 In all), thread 4739(max
> 32000), openfile 28000(max 65536), cpu disk utilization ratio about
> 30%(32 core), disk util  less than 30%.
>     Any suggestion for this? Thanks!
>
>
>

Mime
View raw message