cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Kivity <...@scylladb.com>
Subject Re: Slow paging query on Cassandra.
Date Sat, 27 Jan 2018 11:58:36 GMT
Does the last_update_date constraint filter out a lot of rows? In that 
case the server may be reading a large number of rows, only to throw 
them away since they get filtered out.


If you apply the filter on the client side, you shouldn't see timeouts 
(but overall the process will be slower since you have to transfer more 
data).


btw, from the logs it looks like the client is multi-threaded, there are 
different token ranges in the same time period.


On 01/26/2018 10:39 PM, Juan Manuel Alonso wrote:
> Hi guys,
>
> I'm having some trouble while using paged queries on Cassandra's Java 
> driver (version 3.3.2). I'm using Cassandra 3.11.0.
>
> I have to fetch a page of data from the DB, then make some trivial 
> changes, and then update these rows.
>
> A simplified version of the code i'm running would be:
>
>                     Integer rowCounter = 0;
>                     Statement selectQuery = 
> QueryBuilder.select()...setFetchSize(pageSize)...;
>                     ResultSet result = 
> cassandraSession.execute(selectQuery);
>                     List<MyClass> mappedResults = new ArrayList<>();
>                     for (Row row : result) {
>                         rowCounter++;
>                         mappedResults.add(map(row));
>                         if (rowCounter % pageSize == 0) {
>
>                             List<MyClass> resultsToUpdate = 
> modifyData(mappedResults);
>                             for (MyClass resultToUpdate : 
> resultsToUpdate){
>                                 Statement updateQuery = 
> QueryBuilder.update(KEYSPACE, tableName)...;
> cassandraSession.execute(query);
>                             }
> TimeUnit.SECONDS.sleep(sleepSeconds); //Sleep for a few seconds to let 
> the DB... breathe
>                         }
>
>                     }
>
> I'm using consistency level ONE on both select and update queries, the 
> value of sleepSeconds is 5 and the pageSize is 47.
>
> My problem is that I have to use very small page sizes, otherwise 
> queries start to timeout on Cassandra.
>
> There is only one thread running this long update process, but when i 
> check Cassandra's debug.log, it looks like this:
>
> ...
> DEBUG [ScheduledTasks:1] 2018-01-26 12:43:09,221 
> MonitoringTask.java:173 - 55 operations were slow in the last 5001 msecs:
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 1940709131428868672 AND token(id) <= 1976881771356013545 
> LIMIT 47>, time 1232 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -31240603717813337 AND token(id) <= 93066413544676618 
> LIMIT 47>, time 672 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -2746601914911102981 AND token(id) <= -2679503374406295369 
> LIMIT 47>, time 722 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 8697901506577253630 AND token(id) <= 8756251242481074941 
> LIMIT 47>, time 1737 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -2566441277217350674 AND token(id) <= -2410488306633473620 
> LIMIT 47>, time 997 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 5186947162422827855 AND token(id) <= 5251256039266177164 
> LIMIT 47>, time 1619 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 523415566358416448 AND token(id) <= 558165594730430519 
> LIMIT 47>, time 793 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -6313110054894254305 AND token(id) <= -6149701678898756666 
> LIMIT 47>, time 510 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 133117363640100699 AND token(id) <= 326755086351479456 
> LIMIT 47>, time 594 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -5773756298752768296 AND token(id) <= -5672224259310839216 
> LIMIT 47>, time 631 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 9138868762246577790 AND token(id) <= 9184809921750217730 
> LIMIT 47>, time 1680 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 1481347618188085389 AND token(id) <= 1529429375374220120 
> LIMIT 47>, time 1337 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -3179570044050246190 AND token(id) <= -2975237200717735765 
> LIMIT 47>, time 773 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -1992364373944487162 AND token(id) <= -1754930707218513982 
> LIMIT 47>, time 793 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 7461256765584395144 AND token(id) <= 7513523865647503158 
> LIMIT 47>, time 1569 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 2199511646454841639 AND token(id) <= 2235092311035533306 
> LIMIT 47>, time 1157 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 5981009014177068366 AND token(id) <= 6193847522724693984 
> LIMIT 47>, time 1549 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 6587824379305518475 AND token(id) <= 6941621185441223079 
> LIMIT 47>, time 1491 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -2888016351766682341 AND token(id) <= -2832466742668731344 
> LIMIT 47>, time 642 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 4599678137499867302 AND token(id) <= 4681791682494977137 
> LIMIT 47>, time 1222 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 4097891947569113599 AND token(id) <= 4205652216148641874 
> LIMIT 47>, time 1421 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 1313069522772322867 AND token(id) <= 1345209653063462051 
> LIMIT 47>, time 1042 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 687538582456175035 AND token(id) <= 714387656474794527 
> LIMIT 47>, time 1262 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 3166906839309766962 AND token(id) <= 3220931935607646481 
> LIMIT 47>, time 1589 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 5661754624946584572 AND token(id) <= 5764445628939515857 
> LIMIT 47>, time 1491 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > 6988693948300673349 AND token(id) <= 7029878484744478913 
> LIMIT 47>, time 1579 msec - slow timeout 500 msec/cross-node
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND 
> token(id) > -601653023540100942 AND token(id) <= -570711602903916229 
> LIMIT 47>, time 1032 msec - slow timeout 500 msec/cross-node
> ... (5 were dropped)
> DEBUG [ScheduledTasks:1] 2018-01-26 12:43:14,222 
> MonitoringTask.java:173 - 52 operations were slow in the last 4998 msecs:
> ...
>
>
> I can't understand why my process is making lots of SELECT queries in 
> just 5 seconds, when it should be making one fetch, updating the rows, 
> and then sleeping for five seconds. If i check my application logs, 
> everything seems to be correct, there is only one fetch every five 
> seconds.
>
> I tried running the same code on another environment (with the same 
> volume of data). It runs much faster and the debug.log looks correct:
> ...
> DEBUG [ScheduledTasks:1] 2018-01-26 12:32:25,574 
> MonitoringTask.java:173 - 1 operations were slow in the last 5003 msecs:
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-03-13 AND 
> token(id) > token(AVrJ6PqGIUe_WZqLXGLv) LIMIT 200>, time 517 msec - 
> slow timeout 500 msec/cross-node
> DEBUG [ScheduledTasks:1] 2018-01-26 12:32:35,575 
> MonitoringTask.java:173 - 1 operations were slow in the last 5000 msecs:
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-03-13 AND 
> token(id) > token(AVrJ6GJqVlin-KDUvTKL) LIMIT 200>, time 680 msec - 
> slow timeout 500 msec/cross-node
> DEBUG [ScheduledTasks:1] 2018-01-26 12:32:40,577 
> MonitoringTask.java:173 - 1 operations were slow in the last 5001 msecs:
> <SELECT * FROM keyspace.table WHERE last_update_date = 2017-03-13 AND 
> token(id) > token(AVrJ6xVrVlin-KDUvXep) LIMIT 200>, time 647 msec - 
> slow timeout 500 msec/cross-node
> ...
>
> Any thoughts on why is this happening would be highly appreciated.
>
> Thanks in advance.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message