incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: rename column family
Date Sun, 13 Feb 2011 19:40:51 GMT
There are functions on the Cassandra API to rename and drop column families, see 
http://wiki.apache.org/cassandra/API dropping a CF does not immediately free up the disk space,
see the docs.

AFAIK the rename is not atomic across the cluster (that would require locks) so you best bet
would be to switch to a new CF in your code

Read and writes in Cassandra compete for resources (CPU and disk) but they will not block
each other as there is no locking system. You may find the performance acceptable, if not
just add more machines :)

Switching CF's may be a valid way to handle meta data bulk deletes, like horizontal partitions
in MS SQL and My SQL. Obviously it will deep end on how much data you have and how much capacity
you have.

Let us know how you get on.

Cheers
Aaron
 
On 11/02/2011, at 11:33 AM, Karl Hiramoto <karl@hiramoto.org> wrote:

> On 02/10/11 22:19, Aaron Morton wrote:
>> That should read "Without more information"
>> 
>> A
>> On 11 Feb, 2011,at 10:15 AM, Aaron Morton <aaron@thelastpickle.com> wrote:
>> 
>>> With more information I'd say this is not a good idea.
>>> 
>>> I would suggest looking at why you do the table switch in the MySql
>>> version and consider if it's still necessary in the Cassandra version.
>>> 
> I do the table switch because it's the fastest way to rebuild an entire
> dataset, Say your importing a flat CSV file, you have various cases.
> 
> 1.  Exact same data loaded, only update timestamp.
> 2.  new data that was not in previous dataset.
> 3.  changed data from previous dataset (update)
> 4.   Data that is not in new data, but is in old.  (delete) Rebuilding
> the entire table saves millions of search/delete operations.
> 
> 
> In mysql reading/writing the table at the same time  (Many millions of
> rows,  many GB of data) slows things down beyond my strict performance
> requirements, doing the rename table, makes both the reads/writes much
> faster..  Yes, I know this probably doesn't apply to Cassandra.
> 
> If Cassandra could do something like the mysql rename it would avoid
> having to do the deletes on individual rows, or the repair/compaction of
> the column family to remove all the stale data.  Disk space usage is
> also very important.   I know after a new import is complete, all the
> old data is stale.
> 
>>> Could you use prefixes in your keys that the app knows about and
>>> switch those?
> Yes, but makes the app more complex, and needs to know when the data is
> consistent after the import.    I think I would have to do a range scan
> to delete all the stale data.
> 
> A TTL would be risky as a TTL too high would waste disk space, and stale
> data would be around longer than wanted.    A TTL too low would risk not
> having data available if a new import should fail, or be delayed.
> 
> 
> --
> Karl

Mime
View raw message