incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin <>
Subject Re: Avoiding High Cell Tombstone Count
Date Tue, 27 May 2014 21:55:16 GMT

I would be willing to help you out with your issues tomorrow afternoon, feel free to give
me a call after 4m ET.  There are lots of people who store *and* update data with cassandra
(at scale).

Colin Clark   | Solutions Architect
DataStax  | 
m | +1-320-221-9531
e  |

We power the big data applications that transform business.

More than 400 customers, including startups and twenty-five percent of the Fortune 100 rely
on DataStax's massively scalable, flexible, fast and continuously available big data platform
built on Apache Cassandra™. DataStax integrates in one cluster (thus requiring no ETL) 
enterprise-ready Cassandra, Apache Hadoop™ for analytics and Apache Solr™ for search,
across multiple data centers and in the cloud all while providing advanced enterprise security
features that keep data safe.

> On May 27, 2014, at 4:16 PM, Robert Coli <> wrote:
>> On Sun, May 25, 2014 at 12:01 PM, Charlie Mason <> wrote:
>> I have a table which has one column per user. It revives at lot of updates to these
columns through out the life time. They are always updates on a few specific columns Firstly
is Cassandra storing a Tombstone for each of these old column values. 
>> ...
>> As you can see that's awful lot of tombstoned cells. That's after a full compaction
as well. Just so you are aware this table is updated using a Paxos IF statement.
> If you do a lot of UPDATEs, perhaps a log structured database with immutable datafiles
from which row fragments are reconciled on read is not for you. Especially if you have to
use lightweight "transactions" to make your application semantics work.
>> Would I better off adding a time based key to the primary key. Then doing a sepperate
insert and then deleting the original. If I did the query with a limit of one it should always
find the first rows before hitting a tombstone. Is that correct? 
> I have no idea what you're asking regarding a LIMIT of 1... in general anything that
scans over multiple partitions is bad. I'm pretty sure you almost always want to use a design
which allows you to use FIRST instead of LIMIT for this reason.
> The overall form of your questions suggests you might be better off using the right tool
for the job, which may not be Cassandra.
> =Rob

View raw message