cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-10989) Move away from SEDA to TPC
Date Fri, 08 Jan 2016 17:51:39 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksey Yeschenko updated CASSANDRA-10989:
------------------------------------------
    Labels: performance  (was: )

> Move away from SEDA to TPC
> --------------------------
>
>                 Key: CASSANDRA-10989
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10989
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>              Labels: performance
>
> Since its inception, Cassandra has been utilising [SEDA |http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf]
at its core.
> As originally conceived, it means every request is split into several stages, and each
stage is backed by a thread pool. That imposes certain challenges:
> - thread parking/unparking overheads (partially improved by SEPExecutor in CASSANDRA-4718)
> - extensive context switching (i-/d- caches thrashing)
> - less than optimal multiple writer/multiple reader data structures for memtables, partitions,
metrics, more
> - hard to grok concurrent code
> - large number of GC roots, longer TTSP
> - increased complexity for moving data structures off java heap
> - inability to easily balance writes/reads/compaction/flushing
> Latency implications of SEDA have been acknowledged by the authors themselves - see 2010
[retrospective on SEDA|http://matt-welsh.blogspot.co.uk/2010/07/retrospective-on-seda.html].
> To fix these issues (and more), two years ago at NGCC [~benedict] suggested moving Cassandra
away from SEDA to the more mechanically sympathetic thread per core architecture (TPC). See
the slides from the original presentation [here|https://docs.google.com/presentation/d/19_U8I7mq9JKBjgPmmi6Hri3y308QEx1FmXLt-53QqEw/edit?ts=56265eb4#slide=id.g98ad32b25_1_19].
> In a nutshell, each core would become a logical shared nothing micro instance of Cassandra,
taking over a portion of the node’s range {{*}}.
> Client connections will be assigned randomly to one of the cores (sharing a single listen
socket). A request that cannot be served by the client’s core will be proxied to the one
owning the data, similar to the way we perform remote coordination today.
> Each thread (pinned to an exclusive core) would have a single event loop, and be responsible
for both serving requests and performing maintenance tasks (flushing, compaction, repair),
scheduling them intelligently.
> One notable exception from the original proposal is that we cannot, unfortunately, use
linux AIO for file I/O, as it's only properly implemented for xfs. We might, however, have
a specialised implementation for xfs and Windows (based on IOCP) later. In the meantime, we
have no other choice other than to hand off I/O that cannot be served from cache to a separate
threadpool.
> Transitioning from SEDA to TPC will be done in stages, incrementally and in parallel.
> This is a high-level overview meta-ticket that will track JIRA issues for each individual
stage.
> {{*}} they’ll share certain things still, like schema, gossip, file I/O threadpool(s),
and maybe MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message