incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate McCall <n...@thelastpickle.com>
Subject Re: High latency on 5 node Cassandra Cluster
Date Wed, 04 Jun 2014 17:33:07 GMT
That is a pretty old version of Cassandra at this point.

If you are using counters anywhere, you are probably seeing
https://issues.apache.org/jira/browse/CASSANDRA-4578 which only shows up
after you hit some arbitrary traffic threshold.

If you don't want to upgrade (you really should), there was an update for
the above in the 1.0 branch which was never released:
https://github.com/apache/cassandra/blob/cassandra-1.0/CHANGES.txt#L2


On Wed, Jun 4, 2014 at 2:12 AM, Arup Chakrabarti <arup@pagerduty.com> wrote:

> Hello. We had some major latency problems yesterday with our 5 node
> cassandra cluster. Wanted to get some feedback on where we could start to
> look to figure out what was causing the issue. If there is more info I
> should provide, please let me know.
>
> Here are the basics of the cluster:
> Clients: Hector and Cassie
> Size: 5 nodes (2 in AWS US-West-1, 2 in AWS US-West-2, 1 in Linode Fremont)
> Replication Factor: 5
> Quorum Reads and Writes enabled
> Read Repair set to true
> Cassandra Version: 1.0.12
>
> We started experiencing catastrophic latency from our app servers. We
> believed at the time this was due to compactions running, and the clients
> were not re-routing appropriately, so we disabled thrift on a single node
> that had high load. This did not resolve the issue. After that, we stopped
> gossip on the same node that had high load on it, again this did not
> resolve anything. We then took down gossip on another node (leaving 3/5 up)
> and that fixed the latency from the application side. For a period of ~4
> hours, every time we would try to bring up a fourth node, the app would see
> the latency again. We then rotated the three nodes that were up to make
> sure it was not a networking event related to a single region/provider and
> we kept seeing the same problem: 3 nodes showed no latency problem, 4 or 5
> nodes would. After the ~4hours, we brought the cluster up to 5 nodes and
> everything was fine.
>
> We currently have some ideas on what caused this behavior, but has anyone
> else seen this type of problem where a full cluster causes problems, but
> removing nodes fixes it? Any input on what to look for in our logs to
> understand the issue?
>
> Thanks
>
> Arup
>



-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Mime
View raw message