cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MichaƂ Michalski <>
Subject Re: is there a key to sstable index file?
Date Thu, 18 Jul 2013 06:01:37 GMT
SSTables are immutable - once they're written to disk, they cannot be 

On read C* checks *all* SSTables [1], but to make it faster, it uses 
Bloom Filters, that can tell you if a row is *not* in a specific 
SSTable, so you don't have to read it at all. However, *if* you read it 
in case you have to, you don't read a whole SSTable - there's an 
in-memory Index Sample, that is used for binary search and returning 
only a (relatively) small block of real (full, on-disk) index, which you 
have to scan  to find a place to retrieve the data from SSTable. 
Additionally you have a KeyCache to make reads faster - it points 
location of data in SSTable, so you don't have to touch Index Sample and 
Index at all.

Once C* retrieves all data "parts" (including the Memtable part), 
timestamps are used to find the most recent version of data.

[1] I believe that it's not true for all cases, as I saw a piece of code 
somewhere in the source, that starts checking SSTables in order from the 
newest to the oldest one (in terms of data timestamps - AFAIR SSTable 
MetaData stores info about smallest and largest timestamp in SSTable), 
and once the newest data for all columns are retrieved (assuming that 
schema is defined), retrieving data stops and older SSTables are not 
checked. If someone could confirm that it works this way and it's not 
something that I saw in my dream and now believe it's real, I'd be glad ;-)

W dniu 17.07.2013 22:58, S Ahmed pisze:
> Since SSTables are mutable, and they are ordered, does this mean that there
> is a index of key ranges that each SS table holds, and the value could be 1
> more sstables that have to be scanned and then the latest one is chosen?
> e.g. Say I write a value "abc" to CF1.  This gets stored in a sstable.
> Then I write "def" to CF1, this gets stored in another sstable eventually.
> How when I go to fetch the value, it has to scan 2 sstables and then figure
> out which is the latest entry correct?
> So is there an index of key's to sstables, and there can be 1 or more
> sstables per key?
> (This is assuming compaction hasn't occurred yet).

View raw message