cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1632) Thread workflow and cpu affinity
Date Thu, 21 Nov 2013 19:27:37 GMT


Benedict commented on CASSANDRA-1632:

In fact (and sorry for the spam), looking at this a little closer I would be strongly tempted
to use this change as a chance to get rid of active altogether, and simply drain straight
from backlog. There is a subtle concurrency bug I raised internally wrt its use that doing
so would eliminate, and we don't actually have any need for it here. All active is, really,
is a set of changes we're buffering to process (i.e. what is now our drain queue). 

If we want getPendingMessages to be correct, we should *either way* also update a volatile
counter with the current number of outstanding messages, as active was previously accurate
for this purpose, but no longer is.

> Thread workflow and cpu affinity
> --------------------------------
>                 Key: CASSANDRA-1632
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Goffinet
>            Assignee: Jason Brown
>              Labels: performance
>         Attachments: 1632-v2.txt, 1632_batchRead-v1.diff, threadAff_reads.txt, threadAff_writes.txt
> Here are some thoughts I wanted to write down, we need to run some serious benchmarks
to see the benefits:
> 1) All thread pools for our stages use a shared queue per stage. For some stages we could
move to a model where each thread has its own queue. This would reduce lock contention on
the shared queue. This workload only suits the stages that have no variance, else you run
into thread starvation. Some stages that this might work: ROW-MUTATION.
> 2) Set cpu affinity for each thread in each stage. If we can pin threads to specific
cores, and control the workflow of a message from Thrift down to each stage, we should see
improvements on reducing L1 cache misses. We would need to build a JNI extension (to set cpu
affinity), as I could not find anywhere in JDK where it was exposed. 
> 3) Batching the delivery of requests across stage boundaries. Peter Schuller hasn't looked
deep enough yet into the JDK, but he thinks there may be significant improvements to be had
there. Especially in high-throughput situations. If on each consumption you were to consume
everything in the queue, rather than implying a synchronization point in between each request.

This message was sent by Atlassian JIRA

View raw message