incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC
Date Tue, 19 Nov 2013 13:28:35 GMT
We had a similar problem when our nodes could not sync using ntp due to VPC
ACL settings. -ml


On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt <srobenal@stanford.edu>wrote:

> Hi all,
>
> I am attempting to bring up our new app on a 3-node cluster and am having
> problems with frequent read timeouts and slow inter-node replication.
> Initially, these errors were mostly occurring in our app server, affecting
> 0.02%-1.0% of our queries in an otherwise unloaded cluster. No exceptions
> were logged on the servers in this case, and reads in a single node
> environment with the same code and client driver virtually never see
> exceptions like this, so I suspect problems with the inter-cluster
> communication between nodes.
>
> The 3 nodes are deployed in a single AWS VPC, and are all in a common
> subnet. The Cassandra version is 2.0.2 following an upgrade this past
> weekend due to NPEs in a secondary index that were affecting certain
> queries under 2.0.1. The servers are m1.large instances running AWS Linux
> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
> All database contents are CQL tables with replication factor of 3, and the
> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>
> In testing with the application, I noticed this afternoon that the
> contents of the 3 nodes differed in their respective copies of the same
> table for newly written data, for time periods exceeding several minutes,
> as reported by cqlsh on each node. Specifying different hosts from the same
> server using cqlsh also exhibited timeouts on multiple attempts to connect,
> and on executing some queries, though they eventually succeeded in all
> cases, and eventually the data in all nodes was fully replicated.
>
> The AWS servers have a security group with only ports 22, 7000, 9042, and
> 9160 open.
>
> At this time, it seems that either I am still missing something in my
> cluster configuration, or maybe there are other ports that are needed for
> inter-node communication.
>
> Any advice/suggestions would be appreciated.
>
>
>
> --
> Steve Robenalt
> Software Architect
> HighWire | Stanford University
> 425 Broadway St, Redwood City, CA 94063
>
> srobenal@stanford.edu
> http://highwire.stanford.edu
>
>
>
>
>
>

Mime
View raw message