cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <>
Subject Re: Efficient bulk range deletions without compactions by dropping SSTables.
Date Wed, 14 May 2014 20:44:56 GMT
> We basically do this same thing in one of our production clusters, but
> rather than dropping SSTables, we drop Column Families. We time-bucket our
> CFs, and when a CF has passed some time threshold (metadata or embedded in
> CF name), it is dropped. This means there is a home-grown system that is
> doing the bookkeeping/maintenance rather than relying on C*s inner
> workings. It is unfortunate that we have to maintain a system which
> maintains CFs, but we've been in a pretty good state for the last 12 months
> using this method.
Yup.  This is exactly what we do for MySQL but it's kind of a shame to do
it with Cassandra.  The SSTable system can do it directly.  I had been
working on a bigtable implementation (which I put on hold for now) that
supported this feature.   Since cassandra can do it directly … it seems a
shame that it's not.

Also, this means you generally have to build a duplicate CQL layer on top
of CQL.  For example, what happens if a range query is over a time range
between tables, you have to scan both.

And if you're doing looks by key then you have to also scan all the
temporal column families but this is exactly what cassandra does with

> Some caveats:
> By default, C* makes snapshots of your data when a table is dropped. You
> can leave that and have something else clear up the snapshots, or if you're
> less paranoid, set auto_snapshot: false in the cassandra.yaml file.
> Cassandra does not handle 'quick' schema changes very well, and we found
> that only one node should be used for these changes. When adding or
> removing column families, we have a single, property defined C* node that
> is designated as the schema node. After making a schema change, we had to
> throw in an artificial delay to ensure that the schema change propagated
> through the cluster before making the next schema change. And of course,
> relying on a single node being up for schema changes is less than ideal, so
> handling fail over to a new node is important.
> The final, and hardest problem, is that C* can't really handle schema
> changes while a node is being bootstrapped (new nodes, replacing a dead
> node). If a column family is dropped, but the new node has not yet received
> that data from its replica, the node will fail to bootstrap when it finally
> begins to receive that data - there is no column family for the data to be
> written to, so that node will be stuck in the joining state, and it's
> system keyspace needs to be wiped and re-synced to attempt to get back to a
> happy state. This unfortunately means we have to stop schema changes when a
> node needs to be replaced, but we have this flow down pretty well.
Nice.  This was excellent feedback.


Location: *San Francisco, CA*
Skype: *burtonator*
… or check out my Google+
War is peace. Freedom is slavery. Ignorance is strength. Corporations are

View raw message