cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
Date Sat, 13 Sep 2014 21:49:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132953#comment-14132953
] 

Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:49 PM:
--------------------------------------------------------------

bq. Since this is the case, should we restrict this method to Murmur tokens only?

Well, if I can spend the time ensuring safety we should be able to use this for RandomPartitioner
also.

bq. I would still change the condition in the preface to use non-strict inequality, though,
because cropping tokens to 32 bits will introduce collisions.

-There will be collisions with or without truncation. The fact that there are collisions doesn't
affect the constraint I've imposed upon the data; we assume nothing about the dataset when
two hashCode()s are equal, and simply resort to the underlying token comparator, so I think
the constraint is sufficient. Non-strict equality is surely more broken, however? It would
hold true for k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional
implication, i.e. k1 <= k2 <==> k1.hashCode() <= k2.hashCode().-

I just reread your comment and realise you meant the RHS only. Agreed this should be changed.

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an "ordering key"
or "ordering prefix"?

Perhaps. This was the simplest approach, and since it *is* a hash key used to index a hash
table it seems suitable to use hashCode(), and impose the extra constraint contextually. I'm
fairly neutral though; we certainly could introduce a new interface. I'll see how it looks.


was (Author: benedict):
bq. Since this is the case, should we restrict this method to Murmur tokens only?

Well, if I can spend the time ensuring safety we should be able to use this for RandomPartitioner
also.

bq. I would still change the condition in the preface to use non-strict inequality, though,
because cropping tokens to 32 bits will introduce collisions.

There will be collisions with or without truncation. The fact that there are collisions doesn't
affect the constraint I've imposed upon the data; we assume nothing about the dataset when
two hashCode()s are equal, and simply resort to the underlying token comparator, so I think
the constraint is sufficient. Non-strict equality is surely more broken, however? It would
hold true for k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional
implication, i.e. k1 <= k2 <==> k1.hashCode() <= k2.hashCode().

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an "ordering key"
or "ordering prefix"?

Perhaps. This was the simplest approach, and since it *is* a hash key used to index a hash
table it seems suitable to use hashCode(), and impose the extra constraint contextually. I'm
fairly neutral though; we certainly could introduce a new interface. I'll see how it looks.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message