Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 103BE11ADB for ; Wed, 30 Apr 2014 17:51:05 +0000 (UTC) Received: (qmail 87281 invoked by uid 500); 30 Apr 2014 17:50:41 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 86788 invoked by uid 500); 30 Apr 2014 17:50:32 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 86543 invoked by uid 99); 30 Apr 2014 17:50:28 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Apr 2014 17:50:28 +0000 Date: Wed, 30 Apr 2014 17:50:28 +0000 (UTC) From: "Joshua McKenzie (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-6890) Standardize on a single read path MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985821#comment-13985821 ] Joshua McKenzie edited comment on CASSANDRA-6890 at 4/30/14 5:49 PM: --------------------------------------------------------------------- Running cql3 native, snappy compression, and mixed load w/1-to-1 ratio looks to have normalized a lot of the performance differential I was seeing. Using linux mmap as a baseline: Raw op/s: | |windows buffered|windows mmap|linux buffered|linux mmap| | 4 threadCount|2236|2171|1953|2111| | 8 threadCount|4716|4673|3955|4300| | 16 threadCount|7605|7529|6795|7465| | 24 threadCount|8662|9231|8341|8819| | 36 threadCount|13907|13147|13237|14451| | 54 threadCount|24039|24817|24177|26073| | 81 threadCount|39016|43673|34154|40929| |121 threadCount|40494|49513|42658|48313| |181 threadCount|53189|53039|49691|52885| |271 threadCount|53447|55354|54842|58779| |406 threadCount|54853|54295|60108|64675| |609 threadCount|60067|56145|61823|70885| |913 threadCount|57333|58483|60763|70398| % Comparison: | | windows buffered|windows mmap|linux buffered|linux mmap| | 4 threadCount|105.92%|102.84%|92.52%|100.00%| | 8 threadCount|109.67%|108.67%|91.98%|100.00%| | 16 threadCount|101.88%|100.86%|91.02%|100.00%| | 24 threadCount|98.22%|104.67%|94.58%|100.00%| | 36 threadCount|96.24%|90.98%|91.60%|100.00%| | 54 threadCount|92.20%|95.18%|92.73%|100.00%| | 81 threadCount|95.33%|106.70%|83.45%|100.00%| |121 threadCount|83.82%|102.48%|88.30%|100.00%| |181 threadCount|100.57%|100.29%|93.96%|100.00%| |271 threadCount|90.93%|94.17%|93.30%|100.00%| |406 threadCount|84.81%|83.95%|92.94%|100.00%| |609 threadCount|84.74%|79.21%|87.22%|100.00%| |913 threadCount|81.44%|83.07%|86.31%|100.00%| As Benedict indicated, an in-process page cache should make the debate between these two paths moot. The results above are quite close to the 10% threshold you've indicated Jonathan; I'd be comfortable normalizing the system on buffered I/O leading up to 3.0 to give us a single read path to migrate to an in-process page cache. I certainly don't see a need for us to keep the mmap'ed path on Windows as there doesn't appear to be a performance differential when using a more representative work-load on cql3. As an aside, do we have a documented set of suggestions on how people should approach stress-testing Cassandra, or perhaps a set of performance regression testing we do against releases? Nothing beats specialized expertise in tuning the stress workload to your expected usage patterns but it might help to give people a baseline and a starting point for their own testing. Pavel: I did record perf runs from both buffered and memory-mapped performance on linux, but given how close the results above are I don't know how much value we'll be able to pull from them. I can attach them to the ticket if you're still interested. was (Author: joshuamckenzie): Running cql3 native, snappy compression, and mixed load w/1-to-1 ratio looks to have normalized a lot of the performance differential I was seeing. Using linux mmap as a baseline: Raw op/s: | |windows buffered|windows mmap|linux buffered|linux mmap| | 4 threadCount|2236|2171|1953|2111| | 8 threadCount|4716|4673|3955|4300| | 16 threadCount|7605|7529|6795|7465| | 24 threadCount|8662|9231|8341|8819| | 36 threadCount|13907|13147|13237|14451| | 54 threadCount|24039|24817|24177|26073| | 81 threadCount|39016|43673|34154|40929| |121 threadCount|40494|49513|42658|48313| |181 threadCount|53189|53039|49691|52885| |271 threadCount|53447|55354|54842|58779| |406 threadCount|54853|54295|60108|64675| |609 threadCount|60067|56145|61823|70885| |913 threadCount|57333|58483|60763|70398| % Comparison: | | windows buffered|windows mmap|linux buffered|linux mmap| | 4 threadCount|105.92%|102.84%|92.52%|100.00%| | 8 threadCount|109.67%|108.67%|91.98%|100.00%| | 16 threadCount|101.88%|100.86%|91.02%|100.00%| | 24 threadCount|98.22%|104.67%|94.58%|100.00%| | 36 threadCount|96.24%|90.98%|91.60%|100.00%| | 54 threadCount|92.20%|95.18%|92.73%|100.00%| | 81 threadCount|95.33%|106.70%|83.45%|100.00%| |121 threadCount|83.82%|102.48%|88.30%|100.00%| |181 threadCount|100.57%|100.29%|93.96%|100.00%| |271 threadCount|90.93%|94.17%|93.30%|100.00%| |406 threadCount|84.81%|83.95%|92.94%|100.00%| |609 threadCount|84.74%|79.21%|87.22%|100.00%| |913 threadCount|81.44%|83.07%|86.31%|100.00%| As Benedict indicated, and in-process page cache should make the debate between these two paths moot. The results above are quite close to the 10% threshold you've indicated Jonathan; I'd be comfortable normalizing the system on buffered I/O leading up to 3.0 to give us a single read path to migrate to an in-process page cache. I certainly don't see a need for us to keep the mmap'ed path on Windows as there doesn't appear to be a performance differential when using a more representative work-load on cql3. As an aside, do we have a documented set of suggestions on how people should approach stress-testing Cassandra, or perhaps a set of performance regression testing we do against releases? Nothing beats specialized expertise in tuning the stress workload to your expected usage patterns but it might help to give people a baseline and a starting point for their own testing. Pavel: I did record perf runs from both buffered and memory-mapped performance on linux, but given how close the results above are I don't know how much value we'll be able to pull from them. I can attach them to the ticket if you're still interested. > Standardize on a single read path > --------------------------------- > > Key: CASSANDRA-6890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6890 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Joshua McKenzie > Assignee: Joshua McKenzie > Labels: performance > Fix For: 3.0 > > Attachments: mmap_gc.jpg, mmap_jstat.txt, mmap_perf.txt, nommap_gc.jpg, nommap_jstat.txt > > > Since we actively unmap unreferenced SSTR's and also copy data out of those readers on the read path, the current memory mapped i/o is a lot of complexity for very little payoff. Clean out the mmapp'ed i/o on the read path. -- This message was sent by Atlassian JIRA (v6.2#6252)