kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Changing number of Kudu worker threads
Date Wed, 13 Feb 2019 16:39:39 GMT
Some comments on the original problem: "we need to process 1000s of
operations per second and noticed that our Kudu 1.5 cluster was only using
10 threads while our application spins up 50 clients/threads"

I wouldn't directly infer that 20 threads won't be enough to match your
needs. The time it takes to service a request can vary greatly, a single
thread could process 500 operations that take 2ms to run, or 2 that take
500ms to run, and you have 20 of those. The queue is there to make sure
that the threads are kept busy instead of bouncing the clients back the
moment all the threads are occupied. Your 50 threads can't constantly pound
all the tservers, there's time spent on the network and whatever processing
needs to happen client-side before they go back to Kudu.

TBH there's not a whole lot of science around how we set those two defaults
(# of threads and queue size), but it's very workload-dependent. Ideally
the tservers would just right-size the pools based on the kind of requests
that are coming in and the amount of memory it can use. I guess CPU also
comes in the picture but again it depends on the workload, Kudu stores data
so it tends to be IO-bound more than CPU-bound.

But the memory concern is very real. To be put in the queue the requests
must be read from the network, so it doesn't take that many 2MB batches of
inserts to occupy a lot of memory. Scans, on the other hand, become a
memory concern in the threads because that's where they materialize data in
memory and, depending on the number of columns scanned and the kind of data
that's read, it could be a lot. That's why the defaults aren't arbitrarily
high, they're more on the safe side.

Have you actually encountered performance issues that you could trace back
to this?

Thanks,

J-D

On Wed, Feb 13, 2019 at 3:49 AM Boris <boriskey@gmail.com> wrote:

> But if we bump threads count to 50, and queue default is 50, we should
> probably bump queue to 100 or something like that, right?
>
> On Wed, Feb 13, 2019, 00:54 Hao Hao <hao.hao@cloudera.com wrote:
>
>> I don't see other flags that are relevant here, maybe others can chime in
>> .
>>
>> For --rpc_service_queue_length, it configs the size of the RPC request
>> queues. The queue helps to buffer requests in case if there is a bunch of
>> them coming at once and service threads are too busy processing already
>> arrived requests. But I don't see it can help with handling more concurrent
>> requests.
>>
>> Best,
>> Hao
>>
>> On Tue, Feb 12, 2019 at 6:45 PM Boris <boriskey@gmail.com> wrote:
>>
>>> Thanks Hao, appreciate your response.
>>>
>>> Do we also need to bump other RPC thread related parameters queue etc.?
>>>
>>> On Tue, Feb 12, 2019, 21:09 Hao Hao <hao.hao@cloudera.com wrote:
>>>
>>>> Hi Boris,
>>>>
>>>> Sorry for the delay,  --rpc_num_service_threads sets the number of
>>>> threads in RPC service thread pool (the default is 20 for tablet
>>>> server, 10 for master).  It should help with processing concurrent incoming
>>>> RPC requests, but increasing it more than the number of available CPU cores
>>>> of the machines may not bring too much value.
>>>>
>>>> You don't need to set the same value for masters and tablet servers.
>>>> Most of the time, tablet servers should have more RPCs where the scans and
>>>> writes are taking place.
>>>>
>>>> Best,
>>>> Hao
>>>>
>>>> On Tue, Feb 12, 2019 at 5:29 PM Boris Tyukin <boris@boristyukin.com>
>>>> wrote:
>>>>
>>>>> Can someone point us to documentation or explain what these parameters
>>>>> really mean or how they should be set on production cluster?
>>>>> I will greatly appreciate it!
>>>>>
>>>>> Boris
>>>>>
>>>>> On Fri, Feb 8, 2019 at 3:40 PM Boris Tyukin <boris@boristyukin.com>
>>>>> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> we need to process 1000s of operations per second and noticed that
>>>>>> our Kudu 1.5 cluster was only using 10 threads while our application
spins
>>>>>> up 50 clients/threads. We observed in the web UI that only 10 threads
are
>>>>>> working and other 40 waiting in the queue.
>>>>>>
>>>>>> We found rpc_num_service_threads parameter in the configuration guide
>>>>>> but it is still not clear to me what we need to adjust exactly to
allow
>>>>>> Kudu to handle more concurrent operations.
>>>>>>
>>>>>> Do we bump this parameter below or we need to consider other
>>>>>> rpc related parameters?
>>>>>>
>>>>>> Also do we need to use the same numbers for Masters and tablets?
>>>>>>
>>>>>> Is there any good numbers to target based on CPU core count?
>>>>>>
>>>>>> --rpc_num_service_threads
>>>>>> <https://kudu.apache.org/docs/configuration_reference.html#kudu-master_rpc_num_service_threads>
>>>>>> <https://kudu.apache.org/docs/configuration_reference.html#kudu-master_rpc_num_service_threads>
>>>>>>
>>>>>> Number of RPC worker threads to run
>>>>>>
>>>>>> Type
>>>>>>
>>>>>> int32
>>>>>>
>>>>>> Default
>>>>>>
>>>>>> 10
>>>>>>
>>>>>> Tags
>>>>>>
>>>>>> advanced
>>>>>>
>>>>>

Mime
View raw message