Pavel, 

Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using?

[]s


2014-06-19 16:10 GMT-03:00 Pavel Kogan <pavel.kogan@cortica.com>:
What a coincidence! Today happened in my cluster of 7 nodes as well.

Regards,
  Pavel


On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <marcelo@s1mbi0se.com.br> wrote:
I have a 10 node cluster with cassandra 2.0.8.

I am taking this exceptions in the log when I run my code. What my code does is just reading data from a CF and in some cases it writes new data.

 WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 6165, exceeding specified threshold of 5120 by 1045.
 WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 21266, exceeding specified threshold of 5120 by 16146.
 WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 BatchStatement.java (line 228) Batch of prepared statements for [identification1.entity, identification1.entity_lookup] is of size 22978, exceeding specified threshold of 5120 by 17858.
 INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for 1024 cells

After some time, one node of the cluster goes down. Then it goes back after some seconds and another node goes down. It keeps happening and there is always a node down in the cluster, when it goes back another one falls.

The only exceptions I see in the log is "connected reset by the peer", which seems to be relative to gossip protocol, when a node goes down.

Any hint of what could I do to investigate this problem further?

Best regards,
Marcelo Valle.