incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: row cache re-fill very slow
Date Wed, 21 Nov 2012 01:03:49 GMT
> INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 451) completed
loading (5175655 ms; 13259976 keys) row cache
So it was reading 2,562 rows per second during startup. I'd say that's not unreasonable performance
for 13 million rows. It will get faster in 1.2, but for now just have the cache save less
keys perhaps. 

> Would something like iterating over SSTables instead, and throwing rows at the cache
that need to be in there feasible ? 
During start up we do not read the -Data.db component of the SStable, only the -Index.db (and
-Filter.db) component. Also the SSTables are opened in parallel. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/11/2012, at 10:39 AM, Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com>
wrote:

> Aaron,
> 
>> What version are you on ? 
> 
> 
> 1.1.5 
> 
>> Do you know how many rows were loaded ?
> 
> INFO [OptionalTasks:1] 2012-11-19 13:08:58,868 ColumnFamilyStore.java (line 451) completed
loading (5175655 ms; 13259976 keys) row cache
> 
>> In both cases I do not believe the cache is stored in token (or key) order. 
> 
> Am i getting this right:  the row keys are read and rows are retrieved from SSTables
in the order their keys are in the cache file..
> Would something like iterating over SSTables instead, and throwing rows at the cache
that need to be in there feasible ? If the SSTables themselves are written sequentially at
compaction time , which is how i remember they are written, SSTable-sized sequential reads
with a filter ( bloom filter for the row cache? :-) ) must be faster than reading from all
across the column family ( i have HDDs and about 1k SSTables )
> 
>> row_cache_keys_to_save in yaml may help you find a happy half way point. 
> 
> 
> If i can keep that high enough, with my data retention requirements, save for the absolute
first get on a row, i can operate entirely out of memory.
> 
> thanks!
> Andras
> 
> Andras Szerdahelyi
> Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
> M: +32 493 05 50 88 | Skype: sandrew84
> 
> 
> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>
> 
> 
> On 19 Nov 2012, at 22:00, aaron morton <aaron@thelastpickle.com>
>  wrote:
> 
>>> i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec
) re-fill of the row cache at start up.
>> It was mentioned the other day.  
>> 
>> What version are you on ? 
>> Do you know how many rows were loaded ? When complete it will log a message with
the pattern 
>> 
>> "completed loading (%d ms; %d keys) row cache for %s.%s"
>> 
>>> How is the "saved row cache file" processed?
>> 
>> In Version 1.1, after the SSTables have been opened the keys in the saved row cache
are read one at a time and the whole row read into memory. This is a single threaded operation.

>> 
>> In 1.2 reading the saved cache is still single threaded, but reading the rows goes
through the read thread pool so is in parallel.
>> 
>> In both cases I do not believe the cache is stored in token (or key) order. 
>> 
>>> ( Admittedly whatever is going on is still much more preferable to starting with
a cold row cache )
>> 
>> row_cache_keys_to_save in yaml may help you find a happy half way point. 
>> 
>> Cheers
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 20/11/2012, at 3:17 AM, Andras Szerdahelyi <andras.szerdahelyi@ignitionone.com>
wrote:
>> 
>>> Hey list,
>>> 
>>> i was just wondering if anyone else is experiencing very slow ( ~ 3.5 MB/sec
) re-fill of the row cache at start up. We operate with a large row cache ( 10-15GB currently
) and we already measure startup times in hours :-)
>>> 
>>> How is the "saved row cache file" processed? Are the cached row keys simply iterated
over and their respective rows read from SSTables - possibly creating random reads with small
enough sstable files, if the keys were not stored in a manner optimised for a quick re-fill
? -  or is there a smarter algorithm ( i.e. scan through one sstable at a time, filter rows
that should be in row cache )  at work and this operation is purely disk i/o bound ?
>>> 
>>> ( Admittedly whatever is going on is still much more preferable to starting with
a cold row cache )
>>> 
>>> thanks!
>>> Andras
>>> 
>>> 
>>> 
>>> Andras Szerdahelyi
>>> Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
>>> M: +32 493 05 50 88 | Skype: sandrew84
>>> 
>>> 
>>> <C4798BB9-9092-4145-880B-A72C6B7AF9A4[41].png>
>>> 
>>> 
>> 
> 


Mime
View raw message