cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-7282) Faster Memtable map
Date Sat, 13 Sep 2014 09:35:34 GMT


Benedict updated CASSANDRA-7282:
    Attachment: profile.yaml

Ok, so I ran a more realistic workload with the attached profile.yaml, 50/50 read/writes,
with reads favouring recently written partitions following an extreme distribution. i.e. the
following stress command:

./tools/bin/cassandra-stress user profile=profile.yaml ops\(insert=5,read=5\) n=20000000 -pop
seq=1..10M read-lookback=extreme\(1..1M,2\) -rate threads=200 -mode cql3 native prepared

This is still a workload geared towards exhibiting favourable behaviour, but it is certainly
a larger than memory workload.

The graph comparing the results (run1.svg) attached demonstrates it is still showing a clear
improvement, of around 10% throughput, reduced latencies, reduced total GC work. It also results
in less frequent flushes, presumably due to it requiring slightly less memory than CSLM.

> Faster Memtable map
> -------------------
>                 Key: CASSANDRA-7282
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>         Attachments: profile.yaml, reads.svg, run1.svg, writes.svg
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.

This message was sent by Atlassian JIRA

View raw message