cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Yeschenko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-10989) Move away from SEDA to TPC
Date Fri, 08 Jan 2016 17:47:40 GMT
Aleksey Yeschenko created CASSANDRA-10989:
---------------------------------------------

             Summary: Move away from SEDA to TPC
                 Key: CASSANDRA-10989
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10989
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Aleksey Yeschenko


Since its inception, Cassandra has been utilising [SEDA |http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf]
at its core.

As originally conceived, it means every request is split into several stages, and each stage
is backed by a thread pool. That imposes certain challenges:
- thread parking/unparking overheads (partially improved by SEPExecutor in CASSANDRA-4718)
- extensive context switching (i-/d- caches thrashing)
- less than optimal multiple writer/multiple reader data structures for memtables, partitions,
metrics, more
- hard to grok concurrent code
- large number of GC roots, longer TTSP
- increased complexity for moving data structures off java heap
- inability to easily balance writes/reads/compaction/flushing

Latency implications of SEDA have been acknowledged by the authors themselves - see 2010 [retrospective
on SEDA|http://matt-welsh.blogspot.co.uk/2010/07/retrospective-on-seda.html].

To fix these issues (and more), two years ago at NGCC [~benedict] suggested moving Cassandra
away from SEDA to the more mechanically sympathetic thread per core architecture (TPC). See
the slides from the original presentation [here|https://docs.google.com/presentation/d/19_U8I7mq9JKBjgPmmi6Hri3y308QEx1FmXLt-53QqEw/edit?ts=56265eb4#slide=id.g98ad32b25_1_19].

In a nutshell, each core would become a logical shared nothing micro instance of Cassandra,
taking over a portion of the node’s range {{*}}.

Client connections will be assigned randomly to one of the cores (sharing a single listen
socket). A request that cannot be served by the client’s core will be proxied to the one
owning the data, similar to the way we perform remote coordination today.

Each thread (pinned to an exclusive core) would have a single event loop, and be responsible
for both serving requests and performing maintenance tasks (flushing, compaction, repair),
scheduling them intelligently.

One notable exception from the original proposal is that we cannot, unfortunately, use linux
AIO for file I/O, as it's only properly implemented for xfs. We might, however, have a specialised
implementation for xfs and Windows (based on IOCP) later. In the meantime, we have no other
choice other than to hand off I/O that cannot be served from cache to a separate threadpool.

Transitioning from SEDA to TPC will be done in stages, incrementally and in parallel.

This is a high-level overview meta-ticket that will track JIRA issues for each individual
stage.

{{*}} they’ll share certain things still, like schema, gossip, file I/O threadpool(s), and
maybe MessagingService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message