cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: SSTable Index and Metadata - are they cached in RAM?
Date Fri, 17 Aug 2012 08:54:10 GMT
> 2) Rad from disk all row keys, in order to find one (binary search) 
No.
At startup cassandra samples the -index.db component every index_interval keys. At worst index_interval
keys must be read from disk. 

> As I understand, in the worst case, we can have three disk seeks (2, 4, 6) pro SSTable
in order to check whenever it contains given column, it that correct ?
It depends on the size of the row. For a small (less than column_index_size_in_kb) size row
it's to get a specific column it's :
* 1 seek in index.db
* 1 seek in data.db 

> I would expect, that sorted row keys (from point 2) ) already contain bloom filter for
their columns. But bloom filter is stored together with column index, is that correct?
Yes

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/08/2012, at 7:31 PM, Maciej Miklas <mac.miklas@gmail.com> wrote:

> Great articles, I did not find those before !
> 
> SSTable Index - yes I mean column Index.
> 
> I would like to understand, how many disk seeks might be required to find column in single
SSTable.
> 
> I am assuming positive bloom filter on row key. Now Cassandra needs to find out whenever
given SSTable contains column name, and this might require few disk seeks:
> 1) Check key cache, if found go to 5)
> 2) Rad from disk all row keys, in order to find one (binary search) 
> 3) Found row key contains disk offset to its column index
> 4) Read from disk column index for our row key. Index contains also bloom filter on column
names
> 5) Use bloom filter on column name, to find out whenever this SSTable might contain our
column
> 6) Read column to finally make sure that is exists
> 
> As I understand, in the worst case, we can have three disk seeks (2, 4, 6) pro SSTable
in order to check whenever it contains given column, it that correct ?
> 
> I would expect, that sorted row keys (from point 2) ) already contain bloom filter for
their columns. But bloom filter is stored together with column index, is that correct?
> 
> 
> Cheers,
> Maciej
> 
> On Fri, Aug 17, 2012 at 12:06 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> What about SSTable index, 
> Not sure what you are referring to there. Each row has a in a SStable has a bloom filter
and may have an index of columns. This is not cached. 
> 
> See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
> 
>>  and Metadata?
> 
> This is the meta data we hold in memory for every open sstable
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java
> 
> Cheers
>   
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/08/2012, at 7:34 PM, Maciej Miklas <mac.miklas@gmail.com> wrote:
> 
>> Hi all,
>> 
>> bloom filter for row keys is always in RAM. What about SSTable index, and Metadata?
>> 
>> Is it cached by Cassandra, or it relays on memory mapped files?
>> 
>> 
>> Thanks,
>> Maciej
> 
> 


Mime
View raw message