Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Thu, 30 Jul 2015 13:18:05 +0000 (UTC)
From: "T Jake Luciani (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12779092.1425394853000.330138.1438262285425@Atlassian.JIRA>
In-Reply-To: <JIRA.12779092.1425394853000@Atlassian.JIRA>
References: <JIRA.12779092.1425394853000@Atlassian.JIRA>
 <JIRA.12779092.1425394853347@arcas>
Subject: [jira] [Updated] (CASSANDRA-8894) Our default buffer size for
 (uncompressed) buffered reads should be smaller, and based on the expected
 record size
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:all-tabpanel ]

T Jake Luciani updated CASSANDRA-8894:
--------------------------------------
    Fix Version/s:     (was: 3.0 alpha 1)
                   3.0 beta 1

> Our default buffer size for (uncompressed) buffered reads should be small=
er, and based on the expected record size
> -------------------------------------------------------------------------=
-----------------------------------------
>
>                 Key: CASSANDRA-8894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>              Labels: benedict-to-commit
>             Fix For: 3.0 beta 1
>
>         Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml
>
>
> A large contributor to slower buffered reads than mmapped is likely that =
we read a full 64Kb at once, when average record sizes may be as low as 140=
 bytes on our stress tests. The TLB has only 128 entries on a modern core, =
and each read will touch 32 of these, meaning we are unlikely to almost eve=
r be hitting the TLB, and will be incurring at least 30 unnecessary misses =
each time (as well as the other costs of larger than necessary accesses). W=
hen working with an SSD there is little to no benefit reading more than 4Kb=
 at once, and in either case reading more data than we need is wasteful. So=
, I propose selecting a buffer size that is the next larger power of 2 than=
 our average record size (with a minimum of 4Kb), so that we expect to read=
 in one operation. I also propose that we create a pool of these buffers up=
-front, and that we ensure they are all exactly aligned to a virtual page, =
so that the source and target operations each touch exactly one virtual pag=
e per 4Kb of expected record size.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)