incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Robenalt <srobe...@stanford.edu>
Subject Re: Cassandra 2.0.2 - Frequent Read timeouts and delays in replication on 3-node cluster in AWS VPC
Date Tue, 19 Nov 2013 16:53:25 GMT
Thanks Michael, I will try that out.


On Tue, Nov 19, 2013 at 5:28 AM, Laing, Michael
<michael.laing@nytimes.com>wrote:

> We had a similar problem when our nodes could not sync using ntp due to
> VPC ACL settings. -ml
>
>
> On Mon, Nov 18, 2013 at 8:49 PM, Steven A Robenalt <srobenal@stanford.edu>wrote:
>
>> Hi all,
>>
>> I am attempting to bring up our new app on a 3-node cluster and am having
>> problems with frequent read timeouts and slow inter-node replication.
>> Initially, these errors were mostly occurring in our app server, affecting
>> 0.02%-1.0% of our queries in an otherwise unloaded cluster. No exceptions
>> were logged on the servers in this case, and reads in a single node
>> environment with the same code and client driver virtually never see
>> exceptions like this, so I suspect problems with the inter-cluster
>> communication between nodes.
>>
>> The 3 nodes are deployed in a single AWS VPC, and are all in a common
>> subnet. The Cassandra version is 2.0.2 following an upgrade this past
>> weekend due to NPEs in a secondary index that were affecting certain
>> queries under 2.0.1. The servers are m1.large instances running AWS Linux
>> and Oracle JDK7u40. The first 2 nodes in the cluster are the seed nodes.
>> All database contents are CQL tables with replication factor of 3, and the
>> application is Java-based, using the latest Datastax 2.0.0-rc1 Java Driver.
>>
>> In testing with the application, I noticed this afternoon that the
>> contents of the 3 nodes differed in their respective copies of the same
>> table for newly written data, for time periods exceeding several minutes,
>> as reported by cqlsh on each node. Specifying different hosts from the same
>> server using cqlsh also exhibited timeouts on multiple attempts to connect,
>> and on executing some queries, though they eventually succeeded in all
>> cases, and eventually the data in all nodes was fully replicated.
>>
>> The AWS servers have a security group with only ports 22, 7000, 9042, and
>> 9160 open.
>>
>> At this time, it seems that either I am still missing something in my
>> cluster configuration, or maybe there are other ports that are needed for
>> inter-node communication.
>>
>> Any advice/suggestions would be appreciated.
>>
>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>> HighWire | Stanford University
>> 425 Broadway St, Redwood City, CA 94063
>>
>> srobenal@stanford.edu
>> http://highwire.stanford.edu
>>
>>
>>
>>
>>
>>
>


-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobenal@stanford.edu
http://highwire.stanford.edu

Mime
View raw message