2) Rad from disk all row keys, in order to find one (binary search) 
No.
At startup cassandra samples the -index.db component every index_interval keys. At worst index_interval keys must be read from disk. 

As I understand, in the worst case, we can have three disk seeks (2, 4, 6) pro SSTable in order to check whenever it contains given column, it that correct ?
It depends on the size of the row. For a small (less than column_index_size_in_kb) size row it's to get a specific column it's :
* 1 seek in index.db
* 1 seek in data.db 

I would expect, that sorted row keys (from point 2) ) already contain bloom filter for their columns. But bloom filter is stored together with column index, is that correct?
Yes

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 17/08/2012, at 7:31 PM, Maciej Miklas <mac.miklas@gmail.com> wrote:

Great articles, I did not find those before !

SSTable Index - yes I mean column Index.

I would like to understand, how many disk seeks might be required to find column in single SSTable.

I am assuming positive bloom filter on row key. Now Cassandra needs to find out whenever given SSTable contains column name, and this might require few disk seeks:
1) Check key cache, if found go to 5)
2) Rad from disk all row keys, in order to find one (binary search)
3) Found row key contains disk offset to its column index
4) Read from disk column index for our row key. Index contains also bloom filter on column names
5) Use bloom filter on column name, to find out whenever this SSTable might contain our column
6) Read column to finally make sure that is exists

As I understand, in the worst case, we can have three disk seeks (2, 4, 6) pro SSTable in order to check whenever it contains given column, it that correct ?

I would expect, that sorted row keys (from point 2) ) already contain bloom filter for their columns. But bloom filter is stored together with column index, is that correct?


Cheers,
Maciej

On Fri, Aug 17, 2012 at 12:06 AM, aaron morton <aaron@thelastpickle.com> wrote:
What about SSTable index, 
Not sure what you are referring to there. Each row has a in a SStable has a bloom filter and may have an index of columns. This is not cached. 


 and Metadata?
This is the meta data we hold in memory for every open sstable

Cheers
  

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 16/08/2012, at 7:34 PM, Maciej Miklas <mac.miklas@gmail.com> wrote:

Hi all,

bloom filter for row keys is always in RAM. What about SSTable index, and Metadata?

Is it cached by Cassandra, or it relays on memory mapped files?


Thanks,
Maciej