cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7282) Faster Memtable map
Date Fri, 26 Sep 2014 20:29:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149929#comment-14149929
] 

Joshua McKenzie commented on CASSANDRA-7282:
--------------------------------------------

(nits interleaved with the rest)

General:
* Could use a comment on LongToken.hashCode() explaining reasoning for shift / truncation.
 While it makes sense in the context of this patch, there's little context within the class
itself to explain why we're just lopping off half the bits.
* Consider more descriptive variable names rather than abbreviations in AtomicReferenceArrayUpdater
(rather than V exp, V upd)
* AtomicReferenceArrayUpdater: [could use some comments|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/utils/concurrent/AtomicReferenceArrayUpdater.java#L35-41]
clarifying index calculation logic

NonBlockingHashOrderedMap.java:
* I think a diagram showing how the container works ([see CSLM|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/concurrent/ConcurrentSkipListMap.java])
would be quite helpful
* Document why INDEX_SHIFT == 18. Currently just a magic number
* Why do we bit shift on generation of index node instead of just using integer value?
      {code} private volatile Node<K, V>[][] index = new Node[1][1 << 10]; {code}
* It might help to reference the shalev whitepaper's title as a reference for some more formal
reading on a similar data structure to what we're using here
* this.index aliasing in predecessor appears redundant
* [predecessor() is still very dense|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/concurrent/NonBlockingHashOrderedMap.java#L182-203].
 Not a bad thing if necessary, but I'm wondering if there aren't opportunities to abstract
out some of the guts of what's going on in there to segregate the algorithm from the implementation.
* The comments on indexHash and firstHashOfIndex are quite helpful.
* As non-blocking algorithms are still somewhat new, it might be worth documenting why we're
using [lazy updates|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6275329] for future
readers of the code.  Might not.
* Would it be worth making maybeResize threshold configurable?  What's our reasoning for targeting
66%?
* [maybeResize|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/concurrent/NonBlockingHashOrderedMap.java#L268-274]
logic could use some more clarity on documentation.
* [Signal-to-noise ratio in resize() seems off|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/concurrent/NonBlockingHashOrderedMap.java#L293-302].
 Might be worth factoring out all the INDEX_SHIFTING and Math.max/min to get implementation
details out of the immediate visibility of the algorithm's scope, assuming we can do that
without impacting performance.
* in putIfAbsent / maybeResize / resize: Just to confirm - are we relying on the executor
to drop duplicate resize calls on the floor?  If so, it might be worth documenting.

LongNonBlockingHashOrderedMapTest.java
* Reinforces the feeling that the heavy bitwise operations could serve to be abstracted out
or more heavily commented to help reduce complexity.  For example, [this|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/test/long/org/apache/cassandra/concurrent/LongNonBlockingHashOrderedMapTest.java#L213-216]
is opaque.  While it's parsable and appears correct, some trivial commenting could help smooth
out the reading / maintaining process
* LongNonBlockingHashOrderedMapTest times out on default settings for ant long-test on my
box
* And still times out at 10x timeout threshold.  :)  This may be a Windows thing - do they
run to completion within the time on your setup?

In summary - the code looks good but in my opinion could use some more documentation and possibly
some abstracting out of lower-level implementation details.  The logic of the container itself
is definitely simpler than the CSLM and seems more tailored to our needs.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our
memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use
a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The
list would impose the normal order on the collection, but a hash index would live alongside
as part of the same data structure, simply mapping into the list and permitting O(1) lookups
and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we
can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes
to be checked at once,  further reducing the constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message