kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <zhqu...@gmail.com>
Subject Scan timeout and slow sync call on TS side
Date Fri, 26 Jan 2018 08:49:30 GMT
Hi Kudu Users,

We use Kudu1.3 and encountered the following exception yesterday:

2018-01-25,20:14:42,001 WARN org.apache.kudu.client.AsyncKuduScanner:
1042e82c70594e3aaca741b686aa91fa pretends to not know
scannerId="14f926d1384f4d848f62a7e701d2d623", scanRequestTimeout=30000)
org.apache.kudu.client.NonRecoverableException: Invalid call sequence ID in
scan request
at org.apache.kudu.client.TabletClient.handleUpstream(TabletClient.java:595)

The following seems to be the corresponding log on TS side:

W0125 20:14:43.624604 69076 rpcz_store.cc:234] Call
kudu.tserver.TabletServerService.Scan from (request call
id 57) took 11635ms (client timeout 10000).
W0125 20:14:43.624671 69076 rpcz_store.cc:238] Trace:
0125 20:14:31.989201 (+     0us) service_pool.cc:143] Inserting onto call
0125 20:14:31.989249 (+    48us) service_pool.cc:202] Handling call
0125 20:14:31.989532 (+   283us) tablet_service.cc:1669] Found scanner
0125 20:14:43.621814 (+11632282us) tablet_service.cc:1728] Deadline expired
- responding early
0125 20:14:43.624597 (+  2783us) inbound_call.cc:130] Queueing success

It seems kudu spend too much time processing the scan request. The CPU load
then was about 7 ~ 10, and the machine has 36 cores.

I also found some warning logs on the server side like:

W0125 20:14:31.228502 68713 env_posix.cc:682] Time spent sync call for
real 1.799s   user 0.000s     sys 0.000s
W0125 20:14:32.640887 68713 env_posix.cc:682] Time spent sync call for
real 1.410s   user 0.000s     sys 0.000s
W0125 20:14:34.116614 68716 env_posix.cc:682] Time spent sync call for
real 5.792s      user 0.000s     sys 0.001s
W0125 20:14:35.665977 68713 env_posix.cc:682] Time spent sync call for
real 2.900s   user 0.000s     sys 0.000s
W0125 20:14:39.278427 68713 env_posix.cc:682] Time spent sync call for
real 1.650s   user 0.000s     sys 0.000s

Some sync call also seems slow, for example
21b28a6ca38548b7afab09324ecf559a takes more than 5 seconds to sync. I
checked more about this tablet and found it has about 110GB on-disk data.

I wonder if this huge tablet cause all the problems, including slow sync
and slow scan, but these are tablets of two different tables, and the cpu
load seems not high, so I'm not sure about it. What's your idea?


View raw message