Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 339521896E for ; Wed, 8 Jul 2015 01:05:05 +0000 (UTC) Received: (qmail 39067 invoked by uid 500); 8 Jul 2015 01:05:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 39031 invoked by uid 500); 8 Jul 2015 01:05:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 39017 invoked by uid 99); 8 Jul 2015 01:05:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 01:05:04 +0000 Date: Wed, 8 Jul 2015 01:05:04 +0000 (UTC) From: "Stefania (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14617778#comment-14617778 ]=20 Stefania commented on CASSANDRA-8894: ------------------------------------- Yes I assumed a normal distribution of the record size. Your suggestion of = a uniform distribution of _start position_ within a page is more straight-f= orward however. Let's start with that: {{size}} =3D 95 percentile, chance o= f crossing =3D {{(size % 4096) / 4096}} Noted about adding size percentile and chance of crossing threshold to the = config without mention in the yaml. I'll also add a *global* setting to ind= icate if the data directories are SSD or spinning disk, and this will inste= ad be in the yaml.=20 > Our default buffer size for (uncompressed) buffered reads should be small= er, and based on the expected record size > -------------------------------------------------------------------------= ----------------------------------------- > > Key: CASSANDRA-8894 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8894 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Labels: benedict-to-commit > Fix For: 3.x > > > A large contributor to slower buffered reads than mmapped is likely that = we read a full 64Kb at once, when average record sizes may be as low as 140= bytes on our stress tests. The TLB has only 128 entries on a modern core, = and each read will touch 32 of these, meaning we are unlikely to almost eve= r be hitting the TLB, and will be incurring at least 30 unnecessary misses = each time (as well as the other costs of larger than necessary accesses). W= hen working with an SSD there is little to no benefit reading more than 4Kb= at once, and in either case reading more data than we need is wasteful. So= , I propose selecting a buffer size that is the next larger power of 2 than= our average record size (with a minimum of 4Kb), so that we expect to read= in one operation. I also propose that we create a pool of these buffers up= -front, and that we ensure they are all exactly aligned to a virtual page, = so that the source and target operations each touch exactly one virtual pag= e per 4Kb of expected record size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)