java8964 java8964 <>
questions related to the SSTable file
Tue, 17 Sep 2013 01:51:52 GMT
Hi, I have some questions related to the SSTable in the Cassandra, as I am doing a project
to use it and hope someone in this list can share some thoughts.
My understand is the SSTable is per column family. But each column family could have multi
SSTable files. During the runtime, one row COULD split into more than one SSTable file, even
this is not good for performance, but it does happen, and Cassandra will try to merge and
store one row data into one SSTable file during compassion.
The question is when one row is split in multi SSTable files, what is the boundary? Or let
me ask this way, if one row exists in 2 SSTable files, if I run sstable2json tool to run on
both SSTable files individually:
1) I will expect same row key could show up in both sstable2json output, as this one row exists
in both SSTable files, right?2) If so, what is the boundary? Will Cassandra guarantee the
column level as the boundary? What I mean is that for one column's data, it will be guaranteed
to be either in the first file, or 2nd file, right? There is no chance that Cassandra will
cut the data of one column into 2 part, and one part stored in first SSTable file, and the
other part stored in second SSTable file. Is my understanding correct?3) If what we are talking
about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the
runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files,
first can one row data still may exist in more than one SSTable file? And any boundary change
in this case?4) If I want to use incremental backup SSTable files as the way to catch data
being changed, is it a good way to do what I try to archive? In this case, what happen in
the following example:
For column family A:at Time 0, one row key (key1) has some data. It is being stored and back
up in SSTable file Time 1, if any column for key1 has any change (a new column insert,
a column updated/deleted, or even whole row being deleted), I will expect this whole row exists
in the any incremental backup SSTable files after time 1, right?
What happen if the above row just happen to store in more than one SSTable file?at Time 0,
one row key (key1) has some data, and it just is stored in SSTable file1 and file2, and being Time 1, if one column is added in row key1, and the change in fact will happen in
SSTable file2 only in this case, and if we do a incremental backup after that, what SSTable
files should I expect in this backup? Both SSTable files? Or Just SSTable file 2?
I was thinking incremental backup SSTable files are good candidate for catching data being
changed, but as one row data could exist in multi SSTable file makes thing complex now. Did
anyone have any experience to use SSTable file in this way? What are the lessons?
