Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A660B1806C for ; Fri, 21 Aug 2015 14:13:52 +0000 (UTC) Received: (qmail 4384 invoked by uid 500); 21 Aug 2015 14:13:47 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 4346 invoked by uid 500); 21 Aug 2015 14:13:47 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 4335 invoked by uid 99); 21 Aug 2015 14:13:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Aug 2015 14:13:47 +0000 Date: Fri, 21 Aug 2015 14:13:47 +0000 (UTC) From: "Stefania (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706777#comment-14706777 ] Stefania commented on CASSANDRA-8630: ------------------------------------- Thanks for your comments. bq. For ChecksummedDataInput, we can just update the crc whenever we exhaust the buffer, and on calling getCrc() we can update with whatever we have read so far in the current buffer. I tried that and it didn't work. HintsReader wraps a reader but it still uses the underlying reader to read the crc values, i.e. the crc is in the same stream, but it should be excluded from updating the crc. In other words, only when reading we should update the crc in place, looking at the buffer content is not sufficient. bq. Introducing an extra forceSlowPath property in the superclass to every single call is something I would prefer we avoid. I don't like it either but other than overloading all read methods, we need to rethink how HintsReader updates the crc, unless you have another idea. I agree on all other points you've raised. > Faster sequential IO (on compaction, streaming, etc) > ---------------------------------------------------- > > Key: CASSANDRA-8630 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8630 > Project: Cassandra > Issue Type: Improvement > Components: Core, Tools > Reporter: Oleg Anastasyev > Assignee: Stefania > Labels: compaction, performance > Fix For: 3.x > > Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, mmaped_uncomp_hotspot.png > > > When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int). > This is because default implementations of readShort,readLong, etc as well as their matching write* are implemented with numerous calls of byte by byte read and write. > This makes a lot of syscalls as well. > A quick microbench shows than just reimplementation of these methods in either way gives 8x speed increase. > A patch attached implements RandomAccessReader.read and SequencialWriter.write methods in more efficient way. > I also eliminated some extra byte copies in CompositeType.split and ColumnNameHelper.maxComponents, which were on my profiler's hotspot method list during tests. > A stress tests on my laptop show that this patch makes compaction 25-30% faster on uncompressed sstables and 15% faster for compressed ones. > A deployment to production shows much less CPU load for compaction. > (I attached a cpu load graph from one of our production, orange is niced CPU load - i.e. compaction; yellow is user - i.e. not compaction related tasks) -- This message was sent by Atlassian JIRA (v6.3.4#6332)