cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Luckhurst <phil.luckhu...@powerassure.com>
Subject RPC timeout paging secondary index query results
Date Wed, 11 Jun 2014 09:24:02 GMT
Is paging through the results of a secondary index query broken in Cassandra
2.0.7 or are we doing something wrong?

We have table with a few hundred thousand records and an indexed
low-cardinality column. The relevant bits of the table definition are shown
below

CREATE TABLE measurement (
measurement_id uuid,
controller_id uuid,
...
PRIMARY KEY (measurement_id)
);

CREATE INDEX ON measurement(controller_id);

We originally got the timeout when trying to page through the results of a
'SELECT * FROM measurement WHERE controller_id = xxx-xxx-xxx' query using
the java driver 2.0.2 but we can also consistently reproduce the problem
with CQLSH.

In CQLSH we can start paging through the measurement_id entries a 1000 at a
time for a specific controller_id by using the token() method, e.g.

SELECT measurement_id, token(measurement_id) FROM measurement WHERE
controller_id = 0167bfa6-0918-47ba-8b65-dcccecbcd79f AND
token(measurement_id) >= -8975013189301561463 LIMIT 1000;

This works for 8 queries but consistently fails with an RPC timeout for rows
8000-9000. If from row 8000 we start using a smaller LIMIT size we can get
to approx row 8950 but at that point we get the timeout even if we set
'LIMIT 10'. Looking at the trace output it seems to be seems to be doing
thousands of queries on the index table for every request even if we set
'LIMIT 1' - almost as if it's starting from the beginning of the index for
each page request?

It all seems very similar to  CASSANDRA-5975
<https://issues.apache.org/jira/browse/CASSANDRA-5975>   but that is marked
as resolved in Cassandra 2.0.1. For example this query for a single record

SELECT measurement_id, token(measurement_id) FROM measurement WHERE
controller_id = 0167bfa6-0918-47ba-8b65-dcccecbcd79f AND
token(measurement_id) = -8947401969768490998;

works fine and produces approx 60 lines of trace output. If we simply add
'LIMIT 1' to the statement the trace output is approx 70,000 lines!

It looks like we may have to give up on using secondary indexes but it would
be nice to know if what we are trying to do is correct and should work.

Thanks
Phil









--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Mime
View raw message