cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
Date Wed, 04 Feb 2015 14:49:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305181#comment-14305181
] 

Benedict edited comment on CASSANDRA-6936 at 2/4/15 2:49 PM:
-------------------------------------------------------------

bq. Maybe a time will come where comparisons are our main bottleneck but we're not there atm
and future storage changes will probably impact this as well.

We are there already. Speak to [~jblangston@datastax.com] and [~jshook] for instance, who've
each been working with users seeing CPU costs of comparison bottleneck performance. One of
these customers is seeing a blistering 4MB/s of compaction throughput with their CPUs maxed
out. The other had to stop using collections entirely. Comparisons are pretty much the main
time sink for c* when working with clustering columns, and especially collections.

The big problem fields are int, bigint and timestamp. All of these are very commonly used,
and trivial to make byte-order comparable. The optimisations made a little while back had
a significant impact on CPU cost of merges, and they all depend on byte-order comaprability
of every clustering column on the table. For such small fields the cost of the virtual invocation
is a significant percentage of the time spent since the data will generally be in cache, having
just been read off disk. We can avoid multiple such virtual invocations if all of the fields
are byte-order comparable. It also improves instruction cache occupancy for these common methods,
since they all go through the same codepath (at the time of making those optimisations, instruction
cache misses were actually a significant problem, and likely worse on a live server with a
more varied workload).

Future storage changes largely depend on it too for delivering the best performance, as the
binary trie is likely to be the most significant win. Further CASSANDRA-8731 can perhaps exploit
the nature of these fields to reduce costs of merging even further.

That all said, CASSANDRA-8731 may well help get some of the way there by itself, depending
on how things pan out.


was (Author: benedict):
bq. Maybe a time will come where comparisons are our main bottleneck but we're not there atm
and future storage changes will probably impact this as well.

We are there already. Speak to [~jblangston@datastax.com], for instance, who's been working
with two users recently seeing CPU costs of comparison bottleneck performance. One of these
customers is seeing a blistering 4MB/s of compaction throughput with their CPUs maxed out.
Comparisons are pretty much the main time sink for c* when working with clustering columns,
and especially collections.

The big problem fields are int, bigint and timestamp. All of these are very commonly used,
and trivial to make byte-order comparable. The optimisations made a little while back had
a significant impact on CPU cost of merges, and they all depend on byte-order comaprability
of every clustering column on the table. For such small fields the cost of the virtual invocation
is a significant percentage of the time spent since the data will generally be in cache, having
just been read off disk. We can avoid multiple such virtual invocations if all of the fields
are byte-order comparable. It also improves instruction cache occupancy for these common methods,
since they all go through the same codepath (at the time of making those optimisations, instruction
cache misses were actually a significant problem, and likely worse on a live server with a
more varied workload).

Future storage changes largely depend on it too for delivering the best performance, as the
binary trie is likely to be the most significant win. Further CASSANDRA-8731 can perhaps exploit
the nature of these fields to reduce costs of merging even further.

> Make all byte representations of types comparable by their unsigned byte representation
only
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6936
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>
> This could be a painful change, but is necessary for implementing a trie-based index,
and settling for less would be suboptimal; it also should make comparisons cheaper all-round,
and since comparison operations are pretty much the majority of C*'s business, this should
be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes
with major performance impacts). No copying/special casing/slicing should mean fewer opportunities
to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully
this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message