cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paulo Motta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10874) running stress with compaction strategy and replication factor fails on read after write
Date Thu, 17 Dec 2015 01:34:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061292#comment-15061292
] 

Paulo Motta commented on CASSANDRA-10874:
-----------------------------------------

afaik the default stress consistency is ONE for writes and reads, so since you're writing
with 300 threads, it's expected that some mutations will be dropped due to overload and some
ONE reads will fail, since dropped mutations were not yet hinted (only after 10 minutes).
that's why the problem doesn't work without replication or with read CL = quorum.

are repairs completing and do stress reads work after it? if so, I suspect those might only
be reporting/presentation errors.

> running stress with compaction strategy and replication factor fails on read after write
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10874
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10874
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Andrew Hust
>
> When running a read stress after write stress with a compaction strategy and replication
factor matching the node count will fail with an exception.  
> {code}
> Operation x0 on key(s) [38343433384b34364c30]: Data returned was not validated
> {code}
> Example run:
> {code}
> ccm create stress -v git:cassandra-3.0 -n 3 -s
> ccm node1 stress write n=10M -rate threads=300 -schema replication\(factor=3\) compaction\(strategy=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy\)
> ccm node1 nodetool flush
> ccm node1 nodetool compactionstats # check until quiet
> ccm node1 stress read n=10M -rate threads=300
> {code}
> - This will fail with/out vnodes but will occasionally pass without vnodes. 
> - Changing the read phase to be CL=QUORUM will make it pass.  
> - Removing the replication factor on write will make it pass.
> - Happens on all compaction strategies
> So with that in mind I attempted to add a repair after the write phase.  This leads to
1 of 2 outcomes.
> 1: a repair that has a greater than 100% completion, usually stalls after a bit, but
have seen it get to >400% progress:
> {code}
>                                       id   compaction type    keyspace       table  
  completed         total    unit   progress
>     2d5344c0-9dc8-11e5-9d5f-4fdec8d76c27        Validation   keyspace1   standard1  
94722609949   44035292145   bytes    215.11%
> {code}
> 2: a repair that has a greatly inflated completed/total value, it will crunch for a bit
then lockup:
> {code}
>                                      id   compaction type    keyspace       table   completed
         total    unit   progress
>    8c4cf7f0-a34a-11e5-a321-777be88c58ae        Validation   keyspace1   standard1   
       0   874811100900   bytes      0.00%
> ❯ du -sh ~/.ccm/stress/node1/
> 2.4G  ~/.ccm/stress/node1/
> ❯ du -sh ~/.ccm/stress
> 7.1G  ~/.ccm/stress
> {code}
> This has been reproduced on cassandra-3.0 and cassandra-2.1 both locally and using cstar_perf
(links below).  
> A big twist is that cassandra-2.2 will pass the majority of the time.  It will complete
successfully without the repair 8 out of 10 runs.  This can be seen in the cstar_perf links
below.
> cstar_perf runs:
> http://cstar.datastax.com/tests/id/c8fa27a4-a205-11e5-8fbc-0256e416528f
> http://cstar.datastax.com/tests/id/a254c572-a2ce-11e5-a8b9-0256e416528f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message