Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D4C5109C4 for ; Fri, 15 Nov 2013 14:57:26 +0000 (UTC) Received: (qmail 69910 invoked by uid 500); 15 Nov 2013 14:57:24 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 69824 invoked by uid 500); 15 Nov 2013 14:57:24 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 69788 invoked by uid 99); 15 Nov 2013 14:57:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Nov 2013 14:57:24 +0000 Date: Fri, 15 Nov 2013 14:57:24 +0000 (UTC) From: "Ngoc Minh Vo (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-6352) Cluster does not repond to new SELECT query after a timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ngoc Minh Vo updated CASSANDRA-6352: ------------------------------------ Attachment: ErrorStack.txt Descriptions of our table and indexes: CREATE KEYSPACE myks WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' }; USE myks; CREATE TABLE data ( id text, date int, portfolio text, PRIMARY KEY (id) ); CREATE INDEX ON data(portfolio); CREATE INDEX ON data(date); And the query that failed in DJD v2.0.0-b2 SELECT * FROM data WHERE date=1 AND portfolio='a' ALLOW FILTERING; > Cluster does not repond to new SELECT query after a timeout > ----------------------------------------------------------- > > Key: CASSANDRA-6352 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6352 > Project: Cassandra > Issue Type: Bug > Environment: Windows7, C* v2.0.xx, 4-node cluster, JVM 1.7.0_45-b18 Xmx16GB, Datastax Java Driver 1.0.4 and 2.0.0-beta2 > Reporter: Ngoc Minh Vo > Attachments: ErrorStack.txt > > > Hello, > We encounter the following issue three times. Here are the descriptions of the issue: > - data are imported via Datastax Java driver (DJD) v2.0.0-b2 with BatchStatement (i.e.: batch of PreparedStatement). The performance is quite impressive. > - if we query the cluster via cqlsh (C* 2.0.x) and DJD v1.0.4, everything goes well. > - but when we use DJD v2.0.0-b2, we got an exception: > {quote} > com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) > {quote} > - afterward, no Select query works anymore: > -- all query via cqlsh failed with rpc_timeout > -- all query via DJD v1.0.4 failed with the same exception as the v2.0.0-b2 > -- these queries have worked perfectly before the first select with DJD v2.0.0 > - nodetool status shows all nodes still Up and Normal > - nodetool flush still works on all nodes > Only a reboot of all nodes could solve the issue. > Unfortunately, we don't have any exploitable informations in log files: > Node1: the handshaking at 11:28:48 is strange because we didn't reboot any node > {quote} > INFO [MemoryMeter:1] 2013-11-15 11:27:11,724 Memtable.java (line 444) CFS(Keyspace='hector', ColumnFamily='pdl_caching') liveRatio is 5.06951175012658 (just-counted was 4.902669365509605). calculation took 140ms for 57108 columns > INFO [HANDSHAKE-/10.30.226.166] 2013-11-15 11:28:48,550 OutboundTcpConnection.java (line 386) Handshaking version with /10.30.226.166 > INFO [RMI TCP Connection(4)-10.30.224.229] 2013-11-15 11:32:29,256 ColumnFamilyStore.java (line 734) Enqueuing flush of Memtable-sstable_activity@2142066849(0/0 serialized/live bytes, 24 ops) > INFO [FlushWriter:76] 2013-11-15 11:32:29,257 Memtable.java (line 328) Writing Memtable-sstable_activity@2142066849(0/0 serialized/live bytes, 24 ops) > {quote} > Node2: there is a hinted-handoff at 11:30:02... > {quote} > INFO [MemoryMeter:1] 2013-11-15 11:25:32,897 Memtable.java (line 444) CFS(Keyspace='hector', ColumnFamily='pdl_identity') liveRatio is 6.046071792095967 (just-counted was 5.493829833297251). calculation took 3ms for 608 columns > INFO [HintedHandoff:1] 2013-11-15 11:30:02,656 HintedHandOffManager.java (line 322) Started hinted handoff for host: 2ce9f0a8-795c-4733-9d52-06057fcc690d with IP: /10.30.227.8 > INFO [HintedHandoff:1] 2013-11-15 11:30:12,663 HintedHandOffManager.java (line 449) Timed out replaying hints to /10.30.227.8; aborting (0 delivered) > INFO [RMI TCP Connection(6)-10.30.224.229] 2013-11-15 11:35:20,096 ColumnFamilyStore.java (line 734) Enqueuing flush of Memtable-hints@581765413(1028/10280 serialized/live bytes, 2 ops) > {quote} > It seems that the first Select query with DJD v2.0.0-b2 let the cluster in a "pending"/"anormal" state and it no longer responds to future queries. > I know that without logs it will be hard to reproduce. > Thanks and regards, > Minh -- This message was sent by Atlassian JIRA (v6.1#6144)