cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Kivity <>
Subject Re: scylladb
Date Sat, 11 Mar 2017 21:43:06 GMT
There are several issues at play here.

First, a database runs a large number of concurrent operations, each of 
which only consumes a small amount of CPU. The high concurrency is need 
to hide latency: disk latency, or the latency of contacting a remote 
node. This means that the scheduler will need to switch contexts very 
often. A kernel thread scheduler knows very little about the 
application, so it has to switch a lot of context.  A user level 
scheduler is tightly bound to the application, so it can perform the 
switching faster.  There are also implications on the concurrency 
primitives in use (locks etc.) -- they will be much faster for the 
user-level scheduler, because they cooperate with the scheduler.  For 
example, no atomic read-modify-write instructions need to be executed.

Second, how many (kernel) threads should you run?  If you run too few 
threads, then you will not be able to saturate the CPU resources.  This 
is a common problem with Cassandra -- it's very hard to get it to 
consume all of the CPU power on even a moderately large machine.  On the 
other hand, if you have too many threads, you will see latency rise very 
quickly, because kernel scheduling granularity is on the order of 
milliseconds. User-level scheduling, because it leaves control in the 
hand of the application, allows you to both saturate the CPU and 
maintain low latency.

There are other factors, like NUMA-friendliness, but in the end it all 
boils down to efficiency and control.

None of this is new btw, it's pretty common in the storage world.


On 03/11/2017 11:18 PM, Kant Kodali wrote:
> Here is the Java version but I 
> still don't see how user level scheduling can be beneficial (This is a 
> well debated problem)? How can this add to the performance? or say why 
> is user level scheduling necessary Given the Thread per core design 
> and the callback mechanism?
> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity < 
> <>> wrote:
>     Scylla uses a the seastar framework, which provides for both
>     user-level thread scheduling and simple run-to-completion tasks.
>     Huge pages are limited to 2MB (and 1GB, but these aren't available
>     as transparent hugepages).
>     On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>     @Dor
>>     1) You guys have a CPU scheduler? you mean user level thread
>>     Scheduler that maps user level threads to kernel level threads? I
>>     thought C++ by default creates native kernel threads but sure
>>     nothing will stop someone to create a user level scheduling
>>     library if that's what you are talking about?
>>     2) How can one create THP of size 1KB? According to this post
>>     <>
>>     looks like the valid values 2MB and 1GB.
>>     Thanks,
>>     kant
>>     On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <
>>     <>> wrote:
>>         Agreed, I'd recommend to treat benchmarks as a rough guide to
>>         see where there is potential, and follow through with your
>>         own tests.
>>         On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>         Benchmarks are great for FUDly blog posts. Real world work
>>>         loads matter more. Every NoSQL vendor wins their benchmarks.

View raw message