cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-10143) Apparent counter overcount during certain network partitions
Date Thu, 20 Aug 2015 18:05:46 GMT
Joel Knighton created CASSANDRA-10143:
-----------------------------------------

             Summary: Apparent counter overcount during certain network partitions
                 Key: CASSANDRA-10143
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10143
             Project: Cassandra
          Issue Type: Bug
            Reporter: Joel Knighton


This issue is reproducible in this [Jepsen Test|https://github.com/riptano/jepsen/blob/f45f5320db608d48de2c02c871aecc4910f4d963/cassandra/test/cassandra/counter_test.clj#L16].

The test starts a five-node cluster and issues increments by one against a single counter.
It then checks that the counter is in the range [OKed increments, OKed increments + Write
Timeouts] at each read. Increments are issued at CL.ONE and reads at CL.ALL.  Throughout the
test, network failures are induced that create halved network partitions. A halved network
partition splits the cluster into three connected nodes and two connected nodes, randomly.

This test started failing; bisects showed that it was actually a test change that caused this
failure. When the network partitions are induced in a cycle of 15s healthy/45s partitioned
or 20s healthy/45s partitioned, the test failes. When network partitions are induced in a
cycle of 15s healthy/60s partitioned, 20s healthy/45s partitioned, or 20s healthy/60s partitioned,
the test passes.

There is nothing unusual in the logs of the nodes for the failed tests. The results are very
reproducible.

One noticeable trend is that more reads seem to get serviced during the failed tests.

Most testing has been done in 2.1.8 - the same issue appears to be present in 2.2/3.0/trunk,
but I haven't spent as much time reproducing.

Ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message