cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Consistency Level
Date Thu, 05 Jan 2012 11:18:58 GMT
I missed a ! in the code :) The query will break the token ring into ranges based on the node
tokens and then find the UP nodes for each range. 

I've taken another walk through the code, the logs helped. 

In short, you do not have enough UP nodes to support an indexed get at CL ONE. It is working
by design and you *should* have gotten an UnavailableException returned. There must be CL
up replicas for each token range. In your test node 200.190 is down and so the next node,
with RF 2 this means there are no replicas for the range. The log line below is logged just
before the UnavalableException is raised

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes
 do not satisfy ConsistencyLevel (1 required)


You will need at least every RF'th node UP. Another way to look at is if you have RF contiguous
nodes DOWN you cannot perform an indexed get. 

If you are interested this is what the logs are saying…

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) scan ranges
are [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
There are 4 token ranges to query, i.e. we have to make 4 reads to query over the whole cluster.


> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) Blockfor/repair
is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading
org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading
org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
Starting to read for the first token range. A bug in 0.8.6 makes it read from 202.118 when
it does not need to. 


> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1550) Primary
scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1563) Expanding
slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1605) Scanning
index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) collecting
0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 1617) fetched
ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325704860925009,])
Scanned the secondary index on 200.130 and found an entry for the row key 7a7a32323636373030303438303031
matched the index expression. 

> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 46) Sending
RangeSliceReply{rows=} to 171@/172.16.200.130
Returning ZERO rows for the query result. Because the row key we read above has the token
111413491371349413596553235966977111575L which is not in the first token range from above
(0,42535295865117307932921825928971026432] and this is the range we are interested in now.


> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) Blockfor/repair
is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) reading
org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
Processing the second range now. There is only one node up for this range, 202.118

> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line
48) Processing response on a callback from 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line
48) Processing response on a callback from 173@/172.16.202.118
Got the callback from 202.118 for both the query ranges. 

The logs on 202.118 show the same local local query. But I'm a little confused as to why the
row exists on node 2 at all.

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) Blockfor/repair
is 1/false; setting up requests to
Moving on, time to process the third token range (85070591730234615865843651857942052864,127605887595351923798765477786913079296]

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes
 do not satisfy ConsistencyLevel (1 required)

Oh noes there are no nodes available for that token range. Throw UnavailableException

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote:

> Hi Aaron, 
> 
> Thanks for your response!
> 
> I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1 contains
the data, CL ONE and RF 2). I was connected to Node 1 while I ran the test. I still did not
get any data. See below logs:
> 
> -----------------------------------------------------------
> nodetool ring:
> -----------------------------------------------------------
> 
> Address         DC          Rack        Status State   Load            Owns    Token
                                
>                                                                                127605887595351923798765477786913079296
> xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB       25.00%  0    
                                
> xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB       25.00%  42535295865117307932921825928971026432
> xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB       25.00%  85070591730234615865843651857942052864
> xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB       25.00%  127605887595351923798765477786913079296
> 
> -----------------------------------------------------
> Logs on Node 1
> -----------------------------------------------------
> DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line 148) Version
is now 2
>  INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java (line 314)
Endpoint /172.16.200.190 died before hint delivery, aborting
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 303)
Checking remote schema before delivering hints
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 295)
schema for /172.16.200.217 matches local schema
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 309)
Sleeping 17296ms to stagger hint delivery
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java (line 670) scan
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 889) restricted
ranges for query [-1,-1] are [[-1,0], (0,42535295865117307932921825928971026432], (42535295865117307932921825928971026432,85070591730234615865843651857942052864],
(85070591730234615865843651857942052864,127605887595351923798765477786913079296], (127605887595351923798765477786913079296,-1]]
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) scan ranges
are [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
> DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line 148) Version
is now 2
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) Blockfor/repair
is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading
org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading
org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
> DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884 OutboundTcpConnection.java (line
162) attempting to connect to /172.16.202.118
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1550) Primary
scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1563) Expanding
slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1605) Scanning
index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) collecting
0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 1617) fetched
ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325704860925009,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 46) Sending
RangeSliceReply{rows=} to 171@/172.16.200.130
> DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895 ResponseVerbHandler.java (line
48) Processing response on a callback from 171@/172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) Blockfor/repair
is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) reading
org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
> DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line 148) Version
is now 2
> DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line 148) Version
is now 2
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line
48) Processing response on a callback from 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line
48) Processing response on a callback from 173@/172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) Blockfor/repair
is 1/false; setting up requests to
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes
 do not satisfy ConsistencyLevel (1 required)
> -----------------------------------------------------
> Logs on Node 2
> -----------------------------------------------------
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting version 2 for
/172.16.200.130
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line 105) set version
for /172.16.200.130 to 2
> DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line 148) Version
is now 2
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1550) Primary
scan clause is member
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1550) Primary
scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1563) Expanding
slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1563) Expanding
slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1605) Scanning
index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line 1605) Scanning
index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line 123) collecting
0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line 1617) fetched
ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line 123) collecting
0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line 1617) fetched
ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line 46) Sending
RangeSliceReply{rows=} to 172@/172.16.200.130
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line 46) Sending
RangeSliceReply{rows=} to 173@/172.16.200.130
> DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line 148) Version
is now 2
> 
> 
> 
> On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aaron@thelastpickle.com> wrote:
> I've not spent much time with the secondary indexes, so a couple of questions. 
> 
> Whats is the output of nodetool ring ? 
> Which node were you connected to when you did the get ?
> If you enable DEBUG logging what do the log messages from StorageProxy say that contain
the string "scan ranges are" and "reading .. from ..."
>  
> Now for the wild guessing….It's working as designed for a CL ONE request. Looking at
test case 5 and *assuming* you were connected to node 2 this is what I think is happening:
A get indexed slice query without a start key does not know which nodes will contain the data.
Reading the code it will consider the replicas for the minimum token as nodes to send the
query to, for a CL ONE query it will only use one. If you were connected to node 2 the query
would have executed only on node 2. 
> 
> This is where I get confused. What happens if you have 50 nodes, with RF 3, and you execute
a get_indexed_slice at QUOURM with no start_key and the only rows that satisfy the query exist
on nodes 47, 48 and 49. So they are a long way away from the minimum token, assuming SimpleStrategy
and well ordered token ring. 
> 
> I think I've missed something, anyone ?
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
> 
>> Hi Peter,
>> 
>> To test, I wiped out all the data from Cassandra and inserted just one record. The
row key is "7a7a32323636373030303438303031". I used getendpoints to see where my data is and
double checked the same using sstable2json command.
>> 
>> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes cluster.
I used cassandra-cli to query the data by using one of the secondary index but following are
my results:
>> 
>> Test	 Node 1	 Node 2	 Node 3	 Node 4	 Got data back?
>> 1	Up	Up	Up	Up	Yes
>> 2	Up	Up	Up	
>> Yes
>> 3	Up	Up	
>> Up	Yes
>> 4	
>> Up	Up	Up	Yes
>> 5	Up	Up	
>> 
>> No
>> 6	
>> 
>> Up	Up	No
>> 7	Up	
>> 
>> Up	No
>> 
>> 
>> It turns out that even though my consistency level is ONE, since I am using secondary
index to query the data, at least 3 nodes has to be running. And out of these 3 running nodes,
it works even if one nodes contains the data.
>> 
>> Somewhere in the mailing I read that "Iterating through all of the rows matching
an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster,
because each node only knows about data that is indexed locally."
>> 
>> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is the number
of nodes and 2 is the RF. If it is 2, why is it not returning any data when the two nodes
that contains the data is running (test #7)?
>> 
>> For my use case, I have to have a RF of 2 and should be able to query using secondary
index with a CL of ONE. Is this possible when 2 nodes are down in a 4 nodes cluster? Is there
any limitations on using secondary index?
>> 
>> Thanks in advance.
>> 
>> Thanks,
>> Kamal
>> 
>> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <peter.schuller@infidyne.com>
wrote:
>> > Thanks for the response Peter! I checked everything and it look good to me.
>> >
>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>> 
>> While it is certainly possible that you're running into a bug, it
>> seems unlikely to me since it is the kind of bug that would affect
>> almost anyone if it is failing with Unavailable due to unrelated (not
>> in replica sets) nodes being down.
>> 
>> Can you please post back with (1) the ring layout ('nodetool ring'),
>> and (2) the exact row key that you're testing with?
>> 
>> You might also want to run with DEBUG level (modify
>> log4j-server.properties at the top) and the strategy (assuming you are
>> using NetworkTopologyStrategy) will log selected endpoints, and
>> confirm that it's indeed picking endpoints that you think it should
>> based on getendpoints.
>> 
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>> 
> 
> 


Mime
View raw message