cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size
Date Mon, 20 Jul 2015 01:37:05 GMT


Stefania commented on CASSANDRA-8894:

[~benedict] thanks for your review comments. I've applied them in this [latest commit|],
which is also duplicated on the pre-8099 branch and available for performance testing. In
addition to your comments, I've added a couple of unit tests and changed the *>* to *>=*
when determining whether to add one page. This is so that if the page cross chance is zero
we always add one page even at the boundaries (record size is a multiple of a page size).
If you want to override this during commit, that's fine but you need to change the unit tests
expected values too. Because of this, I've added fuzzy comparison for doubles, using epsilon
of 10^-16.

I've updated the test files to remove the restriction on the population of the partition id,
I had no idea the default was so big. Your two other comments on number of operations and
threads are well noted. I was planning on using bigger number of operations, the small number
was just to test the platform, however I was unsure on the number of threads. And yes I will
use syntax such as 100M from now on. :)

Unfortunately cstar_perf is not available at the moment, tests are getting stuck and fail
to progress on both blade_11 and blade_11_b, cc [~enigmacurry].

> Our default buffer size for (uncompressed) buffered reads should be smaller, and based
on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-8894
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>              Labels: benedict-to-commit
>             Fix For: 3.x
>         Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml
> A large contributor to slower buffered reads than mmapped is likely that we read a full
64Kb at once, when average record sizes may be as low as 140 bytes on our stress tests. The
TLB has only 128 entries on a modern core, and each read will touch 32 of these, meaning we
are unlikely to almost ever be hitting the TLB, and will be incurring at least 30 unnecessary
misses each time (as well as the other costs of larger than necessary accesses). When working
with an SSD there is little to no benefit reading more than 4Kb at once, and in either case
reading more data than we need is wasteful. So, I propose selecting a buffer size that is
the next larger power of 2 than our average record size (with a minimum of 4Kb), so that we
expect to read in one operation. I also propose that we create a pool of these buffers up-front,
and that we ensure they are all exactly aligned to a virtual page, so that the source and
target operations each touch exactly one virtual page per 4Kb of expected record size.

This message was sent by Atlassian JIRA

View raw message