Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Wed, 30 Apr 2014 17:50:28 +0000 (UTC)
From: "Joshua McKenzie (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12702474.1395261829216.207996.1398880228002@arcas>
In-Reply-To: <JIRA.12702474.1395261829216@arcas>
References: <JIRA.12702474.1395261829216@arcas>
Subject: [jira] [Comment Edited] (CASSANDRA-6890) Standardize on a single
 read path
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985821#comment-13985821 ] 

Joshua McKenzie edited comment on CASSANDRA-6890 at 4/30/14 5:49 PM:
---------------------------------------------------------------------

Running cql3 native, snappy compression, and mixed load w/1-to-1 ratio looks to have normalized a lot of the performance differential I was seeing.  Using linux mmap as a baseline:

Raw op/s:
| |windows buffered|windows mmap|linux buffered|linux mmap|
|  4 threadCount|2236|2171|1953|2111|
|  8 threadCount|4716|4673|3955|4300|
| 16 threadCount|7605|7529|6795|7465|
| 24 threadCount|8662|9231|8341|8819|
| 36 threadCount|13907|13147|13237|14451|
| 54 threadCount|24039|24817|24177|26073|
| 81 threadCount|39016|43673|34154|40929|
|121 threadCount|40494|49513|42658|48313|
|181 threadCount|53189|53039|49691|52885|
|271 threadCount|53447|55354|54842|58779|
|406 threadCount|54853|54295|60108|64675|
|609 threadCount|60067|56145|61823|70885|
|913 threadCount|57333|58483|60763|70398|

% Comparison:
| | windows buffered|windows mmap|linux buffered|linux mmap|
|  4 threadCount|105.92%|102.84%|92.52%|100.00%|
|  8 threadCount|109.67%|108.67%|91.98%|100.00%|
| 16 threadCount|101.88%|100.86%|91.02%|100.00%|
| 24 threadCount|98.22%|104.67%|94.58%|100.00%|
| 36 threadCount|96.24%|90.98%|91.60%|100.00%|
| 54 threadCount|92.20%|95.18%|92.73%|100.00%|
| 81 threadCount|95.33%|106.70%|83.45%|100.00%|
|121 threadCount|83.82%|102.48%|88.30%|100.00%|
|181 threadCount|100.57%|100.29%|93.96%|100.00%|
|271 threadCount|90.93%|94.17%|93.30%|100.00%|
|406 threadCount|84.81%|83.95%|92.94%|100.00%|
|609 threadCount|84.74%|79.21%|87.22%|100.00%|
|913 threadCount|81.44%|83.07%|86.31%|100.00%|

As Benedict indicated, an in-process page cache should make the debate between these two paths moot.  The results above are quite close to the 10% threshold you've indicated Jonathan; I'd be comfortable normalizing the system on buffered I/O leading up to 3.0 to give us a single read path to migrate to an in-process page cache.  I certainly don't see a need for us to keep the mmap'ed path on Windows as there doesn't appear to be a performance differential when using a more representative work-load on cql3.

As an aside, do we have a documented set of suggestions on how people should approach stress-testing Cassandra, or perhaps a set of performance regression testing we do against releases?  Nothing beats specialized expertise in tuning the stress workload to your expected usage patterns but it might help to give people a baseline and a starting point for their own testing.

Pavel: I did record perf runs from both buffered and memory-mapped performance on linux, but given how close the results above are I don't know how much value we'll be able to pull from them.  I can attach them to the ticket if you're still interested.


was (Author: joshuamckenzie):
Running cql3 native, snappy compression, and mixed load w/1-to-1 ratio looks to have normalized a lot of the performance differential I was seeing.  Using linux mmap as a baseline:

Raw op/s:
| |windows buffered|windows mmap|linux buffered|linux mmap|
|  4 threadCount|2236|2171|1953|2111|
|  8 threadCount|4716|4673|3955|4300|
| 16 threadCount|7605|7529|6795|7465|
| 24 threadCount|8662|9231|8341|8819|
| 36 threadCount|13907|13147|13237|14451|
| 54 threadCount|24039|24817|24177|26073|
| 81 threadCount|39016|43673|34154|40929|
|121 threadCount|40494|49513|42658|48313|
|181 threadCount|53189|53039|49691|52885|
|271 threadCount|53447|55354|54842|58779|
|406 threadCount|54853|54295|60108|64675|
|609 threadCount|60067|56145|61823|70885|
|913 threadCount|57333|58483|60763|70398|

% Comparison:
| | windows buffered|windows mmap|linux buffered|linux mmap|
|  4 threadCount|105.92%|102.84%|92.52%|100.00%|
|  8 threadCount|109.67%|108.67%|91.98%|100.00%|
| 16 threadCount|101.88%|100.86%|91.02%|100.00%|
| 24 threadCount|98.22%|104.67%|94.58%|100.00%|
| 36 threadCount|96.24%|90.98%|91.60%|100.00%|
| 54 threadCount|92.20%|95.18%|92.73%|100.00%|
| 81 threadCount|95.33%|106.70%|83.45%|100.00%|
|121 threadCount|83.82%|102.48%|88.30%|100.00%|
|181 threadCount|100.57%|100.29%|93.96%|100.00%|
|271 threadCount|90.93%|94.17%|93.30%|100.00%|
|406 threadCount|84.81%|83.95%|92.94%|100.00%|
|609 threadCount|84.74%|79.21%|87.22%|100.00%|
|913 threadCount|81.44%|83.07%|86.31%|100.00%|

As Benedict indicated, and in-process page cache should make the debate between these two paths moot.  The results above are quite close to the 10% threshold you've indicated Jonathan; I'd be comfortable normalizing the system on buffered I/O leading up to 3.0 to give us a single read path to migrate to an in-process page cache.  I certainly don't see a need for us to keep the mmap'ed path on Windows as there doesn't appear to be a performance differential when using a more representative work-load on cql3.

As an aside, do we have a documented set of suggestions on how people should approach stress-testing Cassandra, or perhaps a set of performance regression testing we do against releases?  Nothing beats specialized expertise in tuning the stress workload to your expected usage patterns but it might help to give people a baseline and a starting point for their own testing.

Pavel: I did record perf runs from both buffered and memory-mapped performance on linux, but given how close the results above are I don't know how much value we'll be able to pull from them.  I can attach them to the ticket if you're still interested.

> Standardize on a single read path
> ---------------------------------
>
>                 Key: CASSANDRA-6890
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6890
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Joshua McKenzie
>            Assignee: Joshua McKenzie
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: mmap_gc.jpg, mmap_jstat.txt, mmap_perf.txt, nommap_gc.jpg, nommap_jstat.txt
>
>
> Since we actively unmap unreferenced SSTR's and also copy data out of those readers on the read path, the current memory mapped i/o is a lot of complexity for very little payoff.  Clean out the mmapp'ed i/o on the read path.


--
This message was sent by Atlassian JIRA
(v6.2#6252)