cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ngoc Minh Vo (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-6352) Cluster does not repond to new SELECT query after a timeout
Date Fri, 15 Nov 2013 14:53:22 GMT
Ngoc Minh Vo created CASSANDRA-6352:

             Summary: Cluster does not repond to new SELECT query after a timeout
                 Key: CASSANDRA-6352
             Project: Cassandra
          Issue Type: Bug
         Environment: Windows7, C* v2.0.xx, 4-node cluster, JVM 1.7.0_45-b18 Xmx16GB, Datastax
Java Driver 1.0.4 and 2.0.0-beta2
            Reporter: Ngoc Minh Vo


We encounter the following issue three times. Here are the descriptions of the issue:
- data are imported via Datastax Java driver (DJD) v2.0.0-b2 with BatchStatement (i.e.: batch
of PreparedStatement). The performance is quite impressive.
- if we query the cluster via cqlsh (C* 2.0.x) and DJD v1.0.4, everything goes well.
- but when we use DJD v2.0.0-b2, we got an exception:
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query
at consistency ONE (1 responses were required but only 0 replica responded)
- afterward, no Select query works anymore:
-- all query via cqlsh failed with rpc_timeout
-- all query via DJD v1.0.4 failed with the same exception as the v2.0.0-b2
-- these queries have worked perfectly before the first select with DJD v2.0.0
- nodetool status shows all nodes still Up and Normal
- nodetool flush still works on all nodes

Only a reboot of all nodes could solve the issue.
Unfortunately, we don't have any exploitable informations in log files:
Node1: the handshaking at 11:28:48 is strange because we didn't reboot any node
 INFO [MemoryMeter:1] 2013-11-15 11:27:11,724 (line 444) CFS(Keyspace='hector',
ColumnFamily='pdl_caching') liveRatio is 5.06951175012658 (just-counted was 4.902669365509605).
 calculation took 140ms for 57108 columns
 INFO [HANDSHAKE-/] 2013-11-15 11:28:48,550 (line
386) Handshaking version with /
 INFO [RMI TCP Connection(4)-] 2013-11-15 11:32:29,256
(line 734) Enqueuing flush of Memtable-sstable_activity@2142066849(0/0 serialized/live bytes,
24 ops)
 INFO [FlushWriter:76] 2013-11-15 11:32:29,257 (line 328) Writing Memtable-sstable_activity@2142066849(0/0
serialized/live bytes, 24 ops)
Node2: there is a hinted-handoff at 11:30:02...
 INFO [MemoryMeter:1] 2013-11-15 11:25:32,897 (line 444) CFS(Keyspace='hector',
ColumnFamily='pdl_identity') liveRatio is 6.046071792095967 (just-counted was 5.493829833297251).
 calculation took 3ms for 608 columns
 INFO [HintedHandoff:1] 2013-11-15 11:30:02,656 (line 322) Started
hinted handoff for host: 2ce9f0a8-795c-4733-9d52-06057fcc690d with IP: /
 INFO [HintedHandoff:1] 2013-11-15 11:30:12,663 (line 449) Timed
out replaying hints to /; aborting (0 delivered)
 INFO [RMI TCP Connection(6)-] 2013-11-15 11:35:20,096
(line 734) Enqueuing flush of Memtable-hints@581765413(1028/10280 serialized/live bytes, 2

It seems that the first Select query with DJD v2.0.0-b2 let the cluster in a "pending"/"anormal"
state and it no longer responds to future queries.

I know that without logs it will be hard to reproduce.

Thanks and regards,

This message was sent by Atlassian JIRA

View raw message