cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-10989) Move away from SEDA to TPC
Date Fri, 08 Jan 2016 17:51:39 GMT


Aleksey Yeschenko updated CASSANDRA-10989:
    Labels: performance  (was: )

> Move away from SEDA to TPC
> --------------------------
>                 Key: CASSANDRA-10989
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>              Labels: performance
> Since its inception, Cassandra has been utilising [SEDA |]
at its core.
> As originally conceived, it means every request is split into several stages, and each
stage is backed by a thread pool. That imposes certain challenges:
> - thread parking/unparking overheads (partially improved by SEPExecutor in CASSANDRA-4718)
> - extensive context switching (i-/d- caches thrashing)
> - less than optimal multiple writer/multiple reader data structures for memtables, partitions,
metrics, more
> - hard to grok concurrent code
> - large number of GC roots, longer TTSP
> - increased complexity for moving data structures off java heap
> - inability to easily balance writes/reads/compaction/flushing
> Latency implications of SEDA have been acknowledged by the authors themselves - see 2010
[retrospective on SEDA|].
> To fix these issues (and more), two years ago at NGCC [~benedict] suggested moving Cassandra
away from SEDA to the more mechanically sympathetic thread per core architecture (TPC). See
the slides from the original presentation [here|].
> In a nutshell, each core would become a logical shared nothing micro instance of Cassandra,
taking over a portion of the node’s range {{*}}.
> Client connections will be assigned randomly to one of the cores (sharing a single listen
socket). A request that cannot be served by the client’s core will be proxied to the one
owning the data, similar to the way we perform remote coordination today.
> Each thread (pinned to an exclusive core) would have a single event loop, and be responsible
for both serving requests and performing maintenance tasks (flushing, compaction, repair),
scheduling them intelligently.
> One notable exception from the original proposal is that we cannot, unfortunately, use
linux AIO for file I/O, as it's only properly implemented for xfs. We might, however, have
a specialised implementation for xfs and Windows (based on IOCP) later. In the meantime, we
have no other choice other than to hand off I/O that cannot be served from cache to a separate
> Transitioning from SEDA to TPC will be done in stages, incrementally and in parallel.
> This is a high-level overview meta-ticket that will track JIRA issues for each individual
> {{*}} they’ll share certain things still, like schema, gossip, file I/O threadpool(s),
and maybe MessagingService.

This message was sent by Atlassian JIRA

View raw message