Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 1 Feb 2011 20:37:29 +0000 (UTC)
From: "Aaron Morton (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: 
 <404655235.3573.1296592649244.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <774183599.403.1296502828996.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] Commented: (CASSANDRA-2081) Consistency QUORUM does not work
 anymore (hector:Could not fullfill request on this host)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989375#comment-12989375 ] 

Aaron Morton commented on CASSANDRA-2081:
-----------------------------------------

My understanding here is the 0.19 node is sending read requests to the 0.1, 0.2 and 0.3 nodes and only getting a reply from the 0.1 node before timing out. The 0.1 node is the first node the request is sent to, so this is the data request the others are digest. 

The timeout is the rpc_timeout, and can be seen here...

DEBUG [pool-1-thread-1] 2011-02-01 11:48:28,949 ReadCallback.java (line 58) ReadCallback blocking for 2 responses
...10 seconds... 
DEBUG [pool-1-thread-1] 2011-02-01 11:48:38,950 CassandraServer.java (line 483) ... timed out

Whats happening on the 0.2 and 0.3 nodes at this point? Are they logging errors or WARN messages about dropped messages ? Can you see any logs about processing messages from the 0.19 node? I'm not sure the down 0.18 node is a factor here.

The client should be retrying when it gets a timeout, which I think you said Hector was doing. 

 
> Consistency QUORUM does not work anymore (hector:Could not fullfill request on this host)
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2081
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2081
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: linux, hector + cassandra
>            Reporter: Thibaut
>            Priority: Blocker
>             Fix For: 0.7.1
>
>
> I'm using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25.
> Using consistency level Quorum won't work anymore (tested it on read). Consisteny level ONE still works though
> I have tried this with one dead node in my cluster.
> If I restart cassandra with an older svn revision (apache-cassandra-2011-01-28_20-06-01.jar), I can access the cluster with consistency level QUORUM again, while still using apache-cassandra-2011-01-28_20-06-01.jar and hector 7.0.25 in my application.
> 11/01/31 19:54:38 ERROR connection.CassandraHostRetryService: Downed intr1n18(192.168.0.18):9160 host still appears to be down: Unable to open transport to intr1n18(192.168.0.18):9160 , java.net.NoRouteToHostException: No route to host
> 11/01/31 19:54:38 INFO connection.CassandraHostRetryService: Downed Host retry status false with host: intr1n18(192.168.0.18):9160
> 11/01/31 19:54:45 ERROR connection.HConnectionManager: Could not fullfill request on this host CassandraClient<intr1n11:9160-483>
> intr1n11 is marked as up however and I can also access the node through the cassandra cli.
> 192.168.0.1     Up     Normal  8.02 GB         5.00%   0cc
> 192.168.0.2     Up     Normal  7.96 GB         5.00%   199
> 192.168.0.3     Up     Normal  8.24 GB         5.00%   266
> 192.168.0.4     Up     Normal  4.94 GB         5.00%   333
> 192.168.0.5     Up     Normal  5.02 GB         5.00%   400
> 192.168.0.6     Up     Normal  5 GB            5.00%   4cc
> 192.168.0.7     Up     Normal  5.1 GB          5.00%   599
> 192.168.0.8     Up     Normal  5.07 GB         5.00%   666
> 192.168.0.9     Up     Normal  4.78 GB         5.00%   733
> 192.168.0.10    Up     Normal  4.34 GB         5.00%   7ff
> 192.168.0.11    Up     Normal  5.01 GB         5.00%   8cc
> 192.168.0.12    Up     Normal  5.31 GB         5.00%   999
> 192.168.0.13    Up     Normal  5.56 GB         5.00%   a66
> 192.168.0.14    Up     Normal  5.82 GB         5.00%   b33
> 192.168.0.15    Up     Normal  5.57 GB         5.00%   c00
> 192.168.0.16    Up     Normal  5.03 GB         5.00%   ccc
> 192.168.0.17    Up     Normal  4.77 GB         5.00%   d99
> 192.168.0.18    Down   Normal  ?               5.00%   e66
> 192.168.0.19    Up     Normal  4.78 GB         5.00%   f33
> 192.168.0.20    Up     Normal  4.83 GB         5.00%   ffffffffffffffff

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira