incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Standefer <...@simplegeo.com>
Subject Re: Data loss and corruption
Date Wed, 09 Jun 2010 07:20:22 GMT
In my opinion the #1 risk for corruption is user/client error with the
timestamps.  Over time, Cassandra flushes data from memory to disks.  After
it flushes to disk, Cassandra doesn't go back to delete or modify that data.
 Because of this, deletes are performed by writing a "tombstone" to disk.
 This can lead to corruption if you attempt to change the timestamps your
clients are producing after data has been inserted.  For example, if you
originally were using microseconds for timestamps, you may have inserted a
record with a timestamp of 1234567000000.  If you switched your Cassandra
clients to use seconds for the timestamp and attempted to delete a record 1
second later, the tombstone would be placed at 1234567, and since
1234567 < 1234567000000
the record would not be deleted.  A de-facto standard of microseconds has
been recommended to clients, but it's important to ensure consistency if you
switch clients or start using a client in a different language.

More discussion on timestamps:
http://comments.gmane.org/gmane.comp.db.cassandra.devel/1165

-Ben


On Tue, Jun 8, 2010 at 10:45 PM, Hector Urroz <hector@magpieti.com> wrote:

> Hi all,
>
> We're starting to prototype Cassandra for use in a production system and
> became concerned about data corruption after reading the excellent article:
>
>
> http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
>
> where Evan Weaver writes:
>
> "Cassandra is an alpha product and could, theoretically, lose your data. In
> particular, if you change the schema specified in the storage-conf.xml file,
> you must follow these instructions carefully, or corruption will occur (this
> is going to be fixed). Also, the on-disk storage format is subject to
> change, making upgrading a bit difficult."
>
> Is database corruption a well-known or common problem with Cassandra? What
> sources of information would you recommend to help devise a strategy to
> minimize corruption risk, and to detect and recover when corruption does
> occur?
>
> Thanks,
>
> Hector Urroz
>
>

Mime
View raw message