cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6726) Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently of their owners, and move them off-heap when possible
Date Tue, 19 Aug 2014 08:03:18 GMT


Benedict commented on CASSANDRA-6726:

bq. Buffer sizes do vary

Other than the compressed input size, where do they vary? (I might have missed it). The input
size *should* be bound by the chunk size; if it's larger we have a negative compression ratio,
which should be rare, in which case I am happy either _not_ caching them, or with bumping
the size of all buffers we allocate to accommodate the slight extra room (should be very minimal).

bq. so do the required buffer types

I'm not sure this dramatically affects the code, but also with CASSANDRA-7039 this should
no longer be the case, so I'd prefer not to add too much complexity that will be soon made

bq. I tried to define a flexible way to solve this which can be reused in other parts of Cassandra
that require buffers. 

I appreciate this, however I do not think there are other places where it will be meaningfully
useful, now or in the immediate future, and so I'd prefer not to pollute the namespace of
our utilities (which is already unpleasant enough wrt pooling) especially as there is increased
complexity in maintaining this more general pool. I'm not completely against the pooling you
have defined, I just don't see it as better than the simpler approach. If you can make a compelling
case, please do so.

bq. My usual workflow is to first create a patch ... then .... a lot of measurements and comparisons
in the process.

It's good to state when you post a patch that you don't consider it complete (and perhaps
outline things you intend to cover still) so that review can be better targeted; otherwise
it will be assumed the patch is approximately final. Don't go too overboard with micro-measurement.
The particular impact here is likely to be hard to measure in benchmarks, especially with
confidence the results are generalizable, but it's a reasonably obvious win for systems experiencing
GC pressure and we have plenty of other things to address. At the moment there are a lot of
major performance improvements to make with big impacts, so I prefer not to get bogged down
in careful measurement unless it's clear the codepath can have significant performance impact
(as in, at least measurable single digit percentages in common workloads)

bq. For ByteArrayCompressor.uncompress...

Either is fine - the additional complexity is not significant, but forbidding the path is
also fine. So long as we avoid a scenario where we can later accidentally introduce it as
a common (and GC churn inducing) path.

> Recycle CompressedRandomAccessReader/RandomAccessReader buffers independently of their
owners, and move them off-heap when possible
> -----------------------------------------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-6726
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>         Attachments: cassandra-6726.patch
> Whilst CRAR and RAR are pooled, we could and probably should pool the buffers independently,
so that they are not tied to a specific sstable. It may be possible to move the RAR buffer
off-heap, and the CRAR sometimes (e.g. Snappy may possibly support off-heap buffers)

This message was sent by Atlassian JIRA

View raw message