incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 java8964 <java8...@hotmail.com>
Subject RE: questions related to the SSTable file
Date Wed, 18 Sep 2013 01:51:25 GMT
Quote: 
"
To be clear, "incremental backup" feature backs up the data being modified in that period,
because it writes only those files to the incremental backup dir as hard links, between full
snapshots."
I thought I was clearer, but your clarification confused me again.My understanding so far
from all the answer I got so far, I believe, the more accurate statement of "incremental backup"
should be "incremental backup" feature backs up the SSTable files being generated in that
period. 
But there is no way we can be sure that these SSTable files will ONLY contain modified data.
So the statement being quoted above is not exactly right. I agree that all the modified data
in that period will be in the incremental sstable files, but a lot of other unmodified data
will be in them too.
If we have 2 rows data with different row key in the same memtable, and if only 2nd row being
modified. When the memtable is flushed to SSTable file, it will contain both rows, and both
will be in the incremental backup files. So for first row, nothing change, but it will be
in the incremental backup.
If I have one row with one column, now a new column is added, and whole row in one memtable
being flushed to SSTable file, as also in this incremental backup. For first column, nothing
change, but it will still be in incremental backup file.
The point I tried to make is this is important if I design an ETL to consume the incremental
backup SSTable files. As above example, I have to realize that in the incremental backup sstable
files, they could or most likely contain old data which was previous being processed already.
That will require additional logic and responsibility in the ETL to handle it, or any outsider
SSTable consumer to pay attention to it.
Yong
Date: Tue, 17 Sep 2013 18:01:45 -0700
Subject: Re: questions related to the SSTable file
From: rcoli@eventbrite.com
To: user@cassandra.apache.org

On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato <tsato@cloudian.com> wrote:

> So in fact, incremental backup of Cassandra is just hard link all the new SSTable files
being generated during the incremental backup period. It could contain any data, not just
the data being update/insert/delete in this period, correct?


Correct.
But over time, some old enough SSTable files are usually shared across multiple snapshots.


To be clear, "incremental backup" feature backs up the data being modified in that period,
because it writes only those files to the incremental backup dir as hard links, between full
snapshots.

http://www.datastax.com/docs/1.0/operations/backup_restore
"When incremental backups are enabled (disabled by default), Cassandra hard-links each flushed
SSTable to a backups directory under the keyspace data directory. This allows you to store
backups offsite without transferring entire snapshots. Also, incremental backups combine with
snapshots to provide a dependable, up-to-date backup mechanism.
"

What Takenori is referring to is that a full snapshot is in some ways an "incremental backup"
because it shares hard linked SSTables with other snapshots.

=Rob  		 	   		  
Mime
View raw message