cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8230) LongToken no longer needs to use a boxed Long
Date Wed, 05 Nov 2014 11:10:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198224#comment-14198224
] 

Branimir Lambov commented on CASSANDRA-8230:
--------------------------------------------

bq. This is making the inheritance hierarchy for tokens less consistent / clear rather than
more. (3/5 Abstract, 2/5 Token). Rolling back some of the structural changes from CASSANDRA-8171
this soon makes me wary in general principle, though it seems quite reasonable in this case.

Interesting viewpoint. The {{AbstractToken}} introduced in 8171 was not meant to be an access
point to tokens (hence its package-private nature), just an aid to implementing concrete instances.
The fact that you understand it that way is a signal for me that leaving all tokens to be
its descendants is dangerous, since it allows people to rely on that fact.

Following from its "aid" nature, applying it to cases where every method needs to be overridden
(i.e. {{BytesToken}}) is wrong in principle. In that sense this ticket finishes the work that
8171 started.

bq. Do we have reason to believe that boxing/unboxing on tokens is a performance problem and
that, if so, this patch addresses that problem?

The bigger problem is not boxing/unboxing as operations, but the fact that the longs in {{LongToken}}
get stored in {{Memtable}} boxed. This doubles the amount of memory used by a {{LongToken}}
(i.e. increases the space taken by every row by ~32 bytes), which also puts a greater pressure
on caches and memory management, and introduces the possibility of the {{LongToken}} object
and the actual token value to be separated in memory, requiring two memory fetches for reads
instead of one (this would be very costly esp. while scanning the CSLM, but because of the
GC should not happen often in practice, with one very common exception which is handled somewhat
better by the hardware: {{LongToken}} and {{Long}} sitting next to each other, but on two
separate cache lines).

The difference should be quite measurable for in-memory workloads. I will try to get a stress
test demonstrating it.

> LongToken no longer needs to use a boxed Long
> ---------------------------------------------
>
>                 Key: CASSANDRA-8230
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8230
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Minor
>             Fix For: 2.1.2
>
>         Attachments: 8230-2.1-v2.patch, 8230-2.1.patch
>
>
> After CASSANDRA-8171 a token reference field is no longer a requirement for tokens. This
permits LongTokens to include a primitive long field, which should noticeably improve the
space and time efficiency of the Murmur3Partitioner tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message