cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Szerdahelyi <andras.szerdahe...@ignitionone.com>
Subject Re: row cache re-fill very slow
Date Mon, 19 Nov 2012 21:39:48 GMT
Aaron,

What version are you on ?

1.1.5

Do you know how many rows were loaded ?

INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 451) completed
loading (5175655 ms; 13259976 keys) row cache

In both cases I do not believe the cache is stored in token (or key) order.

Am i getting this right:  the row keys are read and rows are retrieved from SSTables in the
order their keys are in the cache file..
Would something like iterating over SSTables instead, and throwing rows at the cache that
need to be in there feasible ? If the SSTables themselves are written sequentially at compaction
time , which is how i remember they are written, SSTable-sized sequential reads with a filter
( bloom filter for the row cache? :-) ) must be faster than reading from all across the column
family ( i have HDDs and about 1k SSTables )

row_cache_keys_to_save in yaml may help you find a happy half way point.

If i can keep that high enough, with my data retention requirements, save for the absolute
first get on a row, i can operate entirely out of memory.

thanks!
Andras

Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84


[cid:7BDF7228-D831-4D98-967A-BE04FEB17544]




On 19 Nov 2012, at 22:00, aaron morton <aaron@thelastpickle.com<mailto:aaron@thelastpickle.com>>
 wrote:

i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec ) re-fill of
the row cache at start up.
It was mentioned the other day.

What version are you on ?
Do you know how many rows were loaded ? When complete it will log a message with the pattern

"completed loading (%d ms; %d keys) row cache for %s.%s"

How is the "saved row cache file" processed?

In Version 1.1, after the SSTables have been opened the keys in the saved row cache are read
one at a time and the whole row read into memory. This is a single threaded operation.

In 1.2 reading the saved cache is still single threaded, but reading the rows goes through
the read thread pool so is in parallel.

In both cases I do not believe the cache is stored in token (or key) order.

( Admittedly whatever is going on is still much more preferable to starting with a cold row
cache )
row_cache_keys_to_save in yaml may help you find a happy half way point.

Cheers


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 20/11/2012, at 3:17 AM, Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com<mailto:andras.szerdahelyi@ignitionone.com>>
wrote:

Hey list,

i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec ) re-fill of
the row cache at start up. We operate with a large row cache ( 10-15GB currently ) and we
already measure startup times in hours :-)

How is the "saved row cache file" processed? Are the cached row keys simply iterated over
and their respective rows read from SSTables - possibly creating random reads with small enough
sstable files, if the keys were not stored in a manner optimised for a quick re-fill ? - 
or is there a smarter algorithm ( i.e. scan through one sstable at a time, filter rows that
should be in row cache )  at work and this operation is purely disk i/o bound ?

( Admittedly whatever is going on is still much more preferable to starting with a cold row
cache )

thanks!
Andras



Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84


<C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>







Mime
View raw message