cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7282) Faster Memtable map
Date Sat, 13 Sep 2014 21:59:34 GMT


Branimir Lambov commented on CASSANDRA-7282:

{quote}I think the constraint is sufficient.{quote}
The constraint is sufficient, but it is too strong and isn't satisfied by the current version
of the code.

Assuming that < is a total ordering (since any ordering defined by a Comparable must be
total), the problem with the current k1 < k2 -> hash(k1) < hash(k2) is that it implies
k1 != k2 -> hash(k1) != hash(k2) and thus hash(k1) == hash(k2) -> k1 == k2, because
k1 != k2 means that either k1 < k2 or k2 < k1. This means that the condition will only
be satisfied for a hash without collisions.

A bidirectional <= has the same problem, since hash(k1) == hash(k2) --> hash(k1) <=
hash(k2) && hash(k2) <= hash(k1) --> k1 <= k2 && k2 <= k1 -->
k1 == k2. It is actually a stronger statement than the original condition, hence it is not
a surprise that it doesn't solve the problem. We need something weaker.

To make it weaker we either strengthen the premise, or weaken the conclusion. We can do the
latter by making the comparison non-strict: k1 < k2 -> hash(k1) <= hash(k2). For
this one k1 != k2 implies only hash(k1) <= hash(k2) || hash(k1) >= hash(k2) which is
trivially true. Looking at the code this is actually the condition you use, because you choose
predecessors based on hash order rather than direct comparison.

{quote}It would hold true for k1:1, k2:2, hash(k1):2, hash(k2):2.{quote}
I don't see anything wrong with this, just a hash collision.

> Faster Memtable map
> -------------------
>                 Key: CASSANDRA-7282
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>         Attachments: profile.yaml, reads.svg, run1.svg, writes.svg
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.

This message was sent by Atlassian JIRA

View raw message