cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jens-U. Mozdzen" <>
Subject table scan speeds
Date Sun, 04 Jan 2015 21:58:41 GMT
Hi *,

we're experimenting with a 4-node Cassandra setup, aggregating input  
streams and persisting time-bucket based aggregation results.

One test was to create "full dumps" of a table with around 32 million  
existing entries. The keyspace was created with a replication factor  
of 2 and the CF has 40 sstables across the four nodes.

The output of "describe table mykeyspace.mycf;":

--- cut here ---
CREATE TABLE mykeyspace.mycf (
     a timestamp,
     b bigint,
     c bigint,
     d bigint,
     e boolean,
     f text,
     g bigint,
     h text,
     i bigint,
     j text,
     k text,
     l bigint,
     m text,
     n text,
     o text,
     p text,
     q text,
     r bigint,
     PRIMARY KEY ((a, b, c), d, e, f)
     AND bloom_filter_fp_chance = 0.01
     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
     AND comment = ''
     AND compaction = {'min_threshold': '4', 'class':  
'max_threshold': '32'}
     AND compression = {'sstable_compression':  
     AND dclocal_read_repair_chance = 0.1
     AND default_time_to_live = 0
     AND gc_grace_seconds = 0
     AND max_index_interval = 2048
     AND memtable_flush_period_in_ms = 0
     AND min_index_interval = 128
     AND read_repair_chance = 0.0
     AND speculative_retry = '99.0PERCENTILE';
--- cut here ---

Cassandra is at version 2.1.2.

The test program is written in Java and using the Datastax driver  
2.1.3, issuing the SimpleQuery "select * from mykeyspace.mycf;",  
reading each row (without further local processing) and counting the  
number of rows returned plus measuring the elapsed time.

Across multiple invocations, the time per row is about 10.8  
milliseconds, which sums up to quite some elapsing time for all the  

When running the query against a single column instead of all columns,  
the average time per record still is at a similar level.

What puzzles me most: During the query (no other activity is running  
on the Cassandra nodes), I see plenty of CPU idle and close to no  
iowait on all four nodes.

Depending on the used SimpleStatement.setFetchSize() used, we run into  
less or more (up to severe) GC trouble, so we settled on using a size  
of 500 (1000 will work, too), keeping the GC stress level low.  
Loooking at the Cassandra logs, I do see some GC activity (we're  
currently using G1GC), but mostly around a few 100ms and not too often  
- so it should be no issue of GC blocking. The JVMs seem to have  
plenty of memory left, according to jconsole.

I have tried running the query with tracing, but the amount of trace  
entries was overwhelming. Scanning the part that fit into the screen  
buffer, nothing obvious turned up.

Given the fact that the servers seem not to be under resource  
constraints, adding more servers will likely not help to improve the  
response time.

How could I shed some light on why the queries will not take the  
servers to their (CPU, I/O) limits?


PS: Doing full scans seems to be an anti-pattern - selecting  
individual records from our tables works sufficiently fast ATM. We're  
looking into changes to the solution architecture so we can avoid  
dumping such big tables, but nevertheless I'm curious what limit we're  
hitting... what's restricting the flow of data?

View raw message