cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walsh, Stephen" <Stephen.Wa...@Aspect.com>
Subject RE: Consistency Issues
Date Wed, 30 Sep 2015 15:22:16 GMT
More information,

I've just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 IncomingTcpConnection.java:97
- UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a nodetool repair.
Before any data hits them.


From: Walsh, Stephen [mailto:Stephen.Walsh@Aspect.com]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I'll try my best to explain.

We have an application that was able to
Write ~1000 p/s
Read ~300 p/s
Total CF created: 400
Total Keyspaces created : 80

On a 4 node Cassandra Cluster with
Version 2.1.6
Replication : 3
Consistency  (Read & Write) : LOCAL_QUORUM
Cores : 4
Ram : 15 GB
Heap Size 8GB

This was fine and worked, but was pushing our application to the max.

---------------------

Next we added a load balancer (HaProxy) to our application.
So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
Write ~1250 p/s
Read 0p/s
Total CF created: 450
Total Keyspaces created : 100

On our application we now see
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica were required
but only 1 acknowledged the write)
(we are using java Cassandra driver 2.1.6)

So we increased the number of Cassandra nodes
To 5, then 6  and each time got the same replication error.

So then we double the spec of every node to
8 cores
30GB  RAM
Heap size 15GB

And we still get this replication error (2 replica were required but only 1 acknowledged the
write)

We know that when we introduce HaProxy Load balancer with 3 of our nodes that its hits Cassandra
3 times quicker.
But we've now increased the Cassandra spec nearly 3 fold, and only for an extra 250 writes
p/s and it still doesn't work.

We're having a hard time finding out why replication is an issue with the size of a cluster.

We tried to get OpsCenter working to monitor the nodes, but due to the amount of CF's in Cassandra
the datastax-agent takes 90% of the CPU on every node.

Any suggestion / recommendation would be very welcome.

Regards
Stephen Walsh



This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its
attachments.

Mime
View raw message