cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size
Date Fri, 17 Jul 2015 07:56:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630928#comment-14630928
] 

Stefania edited comment on CASSANDRA-8894 at 7/17/15 7:55 AM:
--------------------------------------------------------------

[~benedict] I went ahead and implemented the latest suggested optimization in this commit
[here|https://github.com/stef1927/cassandra/commit/ad6712cdc12380ef0529a13ed6e9bd1c5cecebad].
I've also attached tentative stress yaml profiles, which I intend to run like this:

{code}
user profile=https://dl.dropboxusercontent.com/u/15683245/8894_tiny.yaml  ops\(insert=1,\)
n=100000 -rate threads=50
user profile=https://dl.dropboxusercontent.com/u/15683245/8894_tiny.yaml  ops\(singleblob=1,\)
n=100000 -rate threads=50
{code}

Can you confirm the profiles are what you intended, basically a partition id and a blob column
with the size distributed as you previously indicated. I'm not sure if there is anything else
I should do to ensure reads mostly hit disk - other than spreading the partition id across
a big interval? 

I created these additional branches:
- trunk-pre-8099
- 8894-pre-8099
- 8894-pre-8099-first-optim
- 8894-first-optim

The names are self describing except for "first-optim" which means before implementing the
latest optimization. A tag would have been enough but cstar perf does not support it.

Unfortunately cstar perf has been giving me more problems other than tags, cc [~enigmacurry]:

* The old trunk branches pre 8099 fail due to the schema tables changes (http://cstar.datastax.com/tests/id/e134ee7e-2c46-11e5-a180-42010af0688f)
: "InvalidQueryException: Keyspace system_schema does not exist". However I think if we fake
version 2.2 in build.xml we should be OK.
* The new branches either fail because of a nodetool failure (http://cstar.datastax.com/tests/id/86abc144-2c55-11e5-87b9-42010af0688f)
or the graphs are wrong (http://cstar.datastax.com/tests/id/11fe9c5a-2c45-11e5-9760-42010af0688f).

Here is the nodetool failure:

{code}
[10.200.241.104] Executing task 'ensure_running'
[10.200.241.104] run: JAVA_HOME=~/fab/jvms/jdk1.8.0_45 ~/fab/cassandra/bin/nodetool ring
[10.200.241.104] out: error: null
[10.200.241.104] out: -- StackTrace --
[10.200.241.104] out: java.util.NoSuchElementException
[10.200.241.104] out: 	at com.google.common.collect.LinkedHashMultimap$1.next(LinkedHashMultimap.java:506)
[10.200.241.104] out: 	at com.google.common.collect.LinkedHashMultimap$1.next(LinkedHashMultimap.java:494)
[10.200.241.104] out: 	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
[10.200.241.104] out: 	at java.util.Collections.max(Collections.java:708)
[10.200.241.104] out: 	at org.apache.cassandra.tools.nodetool.Ring.execute(Ring.java:63)
[10.200.241.104] out: 	at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:240)
[10.200.241.104] out: 	at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:154)
[10.200.241.104] out: 
[10.200.241.104] out: 
{code}

I'll resume the performance tests once cstar perf is stable again.



was (Author: stefania):
[~benedict] I went ahead and implemented the latest suggested optimization in this commit
[here|https://github.com/stef1927/cassandra/commit/ad6712cdc12380ef0529a13ed6e9bd1c5cecebad].
I've also attached tentative stress yaml profiles, which I intend to run like this:

{code}
user profile=https://dl.dropboxusercontent.com/u/15683245/8894_tiny.yaml  ops\(insert=1,\)
n=100000 -rate threads=50
user profile=https://dl.dropboxusercontent.com/u/15683245/8894_tiny.yaml  ops\(singleblob=1,\)
n=100000 -rate threads=50
{code}

Can you confirm the profiles are what you intended, basically a partition id and a blob column
with the size distributed as you previously indicated. I'm not sure if there is anything else
I should do to ensure reads mostly hit disk - other than spreading the partition id across
a bit interval? 

I created these additional branches:
- trunk-pre-8099
- 8894-pre-8099
- 8894-pre-8099-first-optim
- 8894-first-optim

The names are self describing except for "first-optim" which means before implementing the
latest optimization. A tag would have been enough but cstar perf does not support it.

Unfortunately cstar perf has been giving me more problems other than tags, cc [~enigmacurry]:

* The old trunk branches pre 8099 fail due to the schema tables changes (http://cstar.datastax.com/tests/id/e134ee7e-2c46-11e5-a180-42010af0688f)
: "InvalidQueryException: Keyspace system_schema does not exist". However I think if we fake
version 2.2 in build.xml we should be OK.
* The new branches either fail because of a nodetool failure (http://cstar.datastax.com/tests/id/86abc144-2c55-11e5-87b9-42010af0688f)
or the graphs are wrong (http://cstar.datastax.com/tests/id/11fe9c5a-2c45-11e5-9760-42010af0688f).

Here is the nodetool failure:

{code}
[10.200.241.104] Executing task 'ensure_running'
[10.200.241.104] run: JAVA_HOME=~/fab/jvms/jdk1.8.0_45 ~/fab/cassandra/bin/nodetool ring
[10.200.241.104] out: error: null
[10.200.241.104] out: -- StackTrace --
[10.200.241.104] out: java.util.NoSuchElementException
[10.200.241.104] out: 	at com.google.common.collect.LinkedHashMultimap$1.next(LinkedHashMultimap.java:506)
[10.200.241.104] out: 	at com.google.common.collect.LinkedHashMultimap$1.next(LinkedHashMultimap.java:494)
[10.200.241.104] out: 	at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
[10.200.241.104] out: 	at java.util.Collections.max(Collections.java:708)
[10.200.241.104] out: 	at org.apache.cassandra.tools.nodetool.Ring.execute(Ring.java:63)
[10.200.241.104] out: 	at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:240)
[10.200.241.104] out: 	at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:154)
[10.200.241.104] out: 
[10.200.241.104] out: 
{code}

I'll resume the performance tests once cstar perf is stable again.


> Our default buffer size for (uncompressed) buffered reads should be smaller, and based
on the expected record size
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>              Labels: benedict-to-commit
>             Fix For: 3.x
>
>         Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml
>
>
> A large contributor to slower buffered reads than mmapped is likely that we read a full
64Kb at once, when average record sizes may be as low as 140 bytes on our stress tests. The
TLB has only 128 entries on a modern core, and each read will touch 32 of these, meaning we
are unlikely to almost ever be hitting the TLB, and will be incurring at least 30 unnecessary
misses each time (as well as the other costs of larger than necessary accesses). When working
with an SSD there is little to no benefit reading more than 4Kb at once, and in either case
reading more data than we need is wasteful. So, I propose selecting a buffer size that is
the next larger power of 2 than our average record size (with a minimum of 4Kb), so that we
expect to read in one operation. I also propose that we create a pool of these buffers up-front,
and that we ensure they are all exactly aligned to a virtual page, so that the source and
target operations each touch exactly one virtual page per 4Kb of expected record size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message