cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wim Deblauwe <>
Subject Re: Best way to delete by day?
Date Mon, 30 Jun 2014 08:28:49 GMT

Thanks for the answers.

Are you saying that I could store big binary files in Cassandra ? I have
read somewhere that if the file is more than 10 Mb, it is probably not such
a good idea? The binary files can be up to 50 or 100 Mb, no more in my case.

So the way I understand it, if I store the binary file outside of
Cassandra, I need to delete manually and go with strategy 2 since there are
no notifications.



2014-06-30 10:23 GMT+02:00 DuyHai Doan <>:

> Hello Wim
> TTL is a good fit for your requirement if you want Cassandra to handle the
> deletion task for you.
> Now, clearly there are 2 strategies:
> 1) Store data on the same partition (physical row) and set TTL to expire
> data automatically
> 2) Store data on several partitions, one for each day for example, and
> manage deletion manually or use TTL again
> If you have few data, strategy 1 is fine. If your data is huge and/or you
> need to reclaim disk space quickly (especially with the big binary file),
> you'll probably better off choosing strategy 2. The only drawback with
> strategy 2 is when you need querying data that span over several days,
> you'll have to issue many queries (one for each distinct day) or use the
> "IN" clause of CQL3 but this has a small performance overhead since.
> Do not forget to set gc_grace_seconds to 0 to have data removed quickly.
> About notification, it's not possible right now to be notified on the
> client side when an expiring column (column with TTL) is physically removed
> by Cassandra
> On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe <>
> wrote:
>> Hi,
>> I am getting started with Cassandra (coming from MySQL). I have made a
>> table with timeseries data (inspired on
>> ).
>> The table looks like this:
>> CREATE TABLE event_message (
>> message_id uuid,
>> message_source_id uuid,
>> message_time timestamp,
>> event_type_id varchar,
>> event_state varchar,
>> filter_state varchar,
>> image_id uuid,
>> device_specific_id bigint,
>> device_specific_begin_id bigint,
>> characteristics varchar,
>> PRIMARY KEY (message_source_id, message_time, message_id)
>> );
>> I have now 2 requirements:
>> 1) I need to remove rows after a certain (user settable) time (between 5
>> and 60 days). In MySQL, we used partitions by day to quickly delete a whole
>> day.
>> 2) I need to store a big binary file along with each row and this file
>> should be removed when the row is removed.
>> I was looking into the expiring columns (with the TTL), but is this a
>> good fit for this use case? Is this TTL stored between restarts of
>> Cassandra?
>> Would there be any advantage to use the system called "Partitioning to
>> limit row size – Time Series Pattern 2" in the URL and then explicitly
>> doing a delete of a whole day? With this system, if I query by time, do I
>> need to calculate what days are in the interval and explicitly add this in
>> my query to find the good partitions?
>> How can I get notifications if a row is expired when using TTL so I can
>> removed the associated file?
>> regards,
>> Wim

View raw message