Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13C69111E6 for ; Mon, 30 Jun 2014 08:29:17 +0000 (UTC) Received: (qmail 85902 invoked by uid 500); 30 Jun 2014 08:29:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 85866 invoked by uid 500); 30 Jun 2014 08:29:14 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 85856 invoked by uid 99); 30 Jun 2014 08:29:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jun 2014 08:29:14 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of wim.deblauwe@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-ie0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jun 2014 08:29:10 +0000 Received: by mail-ie0-f179.google.com with SMTP id tr6so6358642ieb.24 for ; Mon, 30 Jun 2014 01:28:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Ew74FPBiu0pr8FCJSzb7/pWZTVGNwgcwVS/wdiLnTRs=; b=K6P+uVx/Mr1UIAS67fZEQygVeXeW+AwFGD9X+jjwOkJMH72xtbhrGwysC3BAjcAcc9 kaewPfYn8yTNf/LYLSsckbcjpt6cLC/5frRhooyVQeqJlHX+nLrtJiZa4H+vIrtTjs2W 2Ac4OAuQd1pyGNWcT5y9P6Xs9RyMy6jVbbWo2XsfnSfizg07Bg+51xTosu3RhLqSLwKv 5PpWSXKDpLLogmmLBG9SAjETa95Ji0bR9/gBuZbsWfM3hfc18OeBfKKRMoV18xD/h2/W wVNzyuo78VlBI3+3nTOiLkVk98TXazmVD051gdaY/acum/n2F92Ev+wqvMuXJaZgIgg/ P5CA== MIME-Version: 1.0 X-Received: by 10.42.244.201 with SMTP id lr9mr36260495icb.2.1404116929681; Mon, 30 Jun 2014 01:28:49 -0700 (PDT) Received: by 10.50.178.212 with HTTP; Mon, 30 Jun 2014 01:28:49 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Jun 2014 10:28:49 +0200 Message-ID: Subject: Re: Best way to delete by day? From: Wim Deblauwe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba614788d11d1304fd097228 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba614788d11d1304fd097228 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Thanks for the answers. Are you saying that I could store big binary files in Cassandra ? I have read somewhere that if the file is more than 10 Mb, it is probably not such a good idea? The binary files can be up to 50 or 100 Mb, no more in my case= . So the way I understand it, if I store the binary file outside of Cassandra, I need to delete manually and go with strategy 2 since there are no notifications. regards, Wim 2014-06-30 10:23 GMT+02:00 DuyHai Doan : > Hello Wim > > TTL is a good fit for your requirement if you want Cassandra to handle th= e > deletion task for you. > > Now, clearly there are 2 strategies: > > 1) Store data on the same partition (physical row) and set TTL to expire > data automatically > 2) Store data on several partitions, one for each day for example, and > manage deletion manually or use TTL again > > If you have few data, strategy 1 is fine. If your data is huge and/or you > need to reclaim disk space quickly (especially with the big binary file), > you'll probably better off choosing strategy 2. The only drawback with > strategy 2 is when you need querying data that span over several days, > you'll have to issue many queries (one for each distinct day) or use the > "IN" clause of CQL3 but this has a small performance overhead since. > > Do not forget to set gc_grace_seconds to 0 to have data removed quickly. > > About notification, it's not possible right now to be notified on the > client side when an expiring column (column with TTL) is physically remov= ed > by Cassandra > > > > > On Mon, Jun 30, 2014 at 9:59 AM, Wim Deblauwe > wrote: > >> Hi, >> >> I am getting started with Cassandra (coming from MySQL). I have made a >> table with timeseries data (inspired on >> http://planetcassandra.org/blog/post/getting-started-with-time-series-da= ta-modeling/ >> ). >> >> The table looks like this: >> >> CREATE TABLE event_message ( >> message_id uuid, >> message_source_id uuid, >> message_time timestamp, >> event_type_id varchar, >> event_state varchar, >> filter_state varchar, >> image_id uuid, >> device_specific_id bigint, >> device_specific_begin_id bigint, >> characteristics varchar, >> PRIMARY KEY (message_source_id, message_time, message_id) >> ); >> >> I have now 2 requirements: >> 1) I need to remove rows after a certain (user settable) time (between 5 >> and 60 days). In MySQL, we used partitions by day to quickly delete a wh= ole >> day. >> 2) I need to store a big binary file along with each row and this file >> should be removed when the row is removed. >> >> I was looking into the expiring columns (with the TTL), but is this a >> good fit for this use case? Is this TTL stored between restarts of >> Cassandra? >> >> Would there be any advantage to use the system called "Partitioning to >> limit row size =E2=80=93 Time Series Pattern 2" in the URL and then expl= icitly >> doing a delete of a whole day? With this system, if I query by time, do = I >> need to calculate what days are in the interval and explicitly add this = in >> my query to find the good partitions? >> >> How can I get notifications if a row is expired when using TTL so I can >> removed the associated file? >> >> regards, >> >> Wim >> > > --90e6ba614788d11d1304fd097228 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Thanks for the answers.
<= br>
Are you saying that I could store big binary files in Cassand= ra ? I have read somewhere that if the file is more than 10 Mb, it is proba= bly not such a good idea? The binary files can be up to 50 or 100 Mb, no mo= re in my case.

So the way I understand it, if I store the binary file = outside of Cassandra, I need to delete manually and go with strategy 2 sinc= e there are no notifications.

regards,

Wim


2014-06-30 10:23 GMT+02:00 DuyHai Doan &= lt;doanduyhai@gma= il.com>:
Hello Wim

TTL is a good fit for your requirement if you want Cassandra to handle th= e deletion task for you.

Now, clearly there are 2 strategies:

1) Store data on the same partition (physical row) and set TTL to expi= re data automatically
2) Store data on several partitions, one fo= r each day for example, and manage deletion manually or use TTL again

If you have few data, strategy 1 is fine. If your data = is huge and/or you need to reclaim disk space quickly (especially with the = big binary file), you'll probably better off choosing strategy 2. The o= nly drawback with strategy 2 is when you need querying data that span over = several days, you'll have to issue many queries (one for each distinct = day) or use the "IN" clause of CQL3 but this has a small performa= nce overhead since.

Do not forget to set gc_grace_seconds to 0 to have data= removed quickly.

About notification, it's not= possible right now to be notified on the client side when an expiring colu= mn (column with TTL) is physically removed by Cassandra




On Mon, Jun = 30, 2014 at 9:59 AM, Wim Deblauwe <wim.deblauwe@gmail.com> wrote:
Hi,

I am= getting started with Cassandra (coming from MySQL). I have made a table wi= th timeseries data (inspired on=C2=A0http://planetcassandra.org/blog/post/getting-started-with-time-series-dat= a-modeling/ ).

The table looks like this:

CREATE TABLE event_message (
message_id uuid,
message= _source_id uuid,
message_time timestamp,
event_type_id = varchar,
event_state varchar,
filter_state varchar,
image_i= d uuid,
device_specific_id bigint,
device_specific_begi= n_id bigint,
characteristics varchar,
PRIMARY KEY (mess= age_source_id, message_time, message_id)
);

I have now 2 requirements:
1) I need to remove rows after a certain (user settable) time (between 5 a= nd 60 days). In MySQL, we used partitions by day to quickly delete a whole = day.
2) I need to store a big binary file along with each row and this file= should be removed when the row is removed.

I was = looking into the expiring columns (with the TTL), but is this a good fit fo= r this use case? Is this TTL stored between restarts of Cassandra?=C2=A0

Would there be any advantage to use the system called &= quot;Partitioning to limit row size =E2=80=93 Time Series Pattern 2" i= n the URL and then explicitly doing a delete of a whole day? With this syst= em, if I query by time, do I need to calculate what days are in the interva= l and explicitly add this in my query to find the good partitions?

How can I get notifications if a row is expired when us= ing TTL so I can removed the associated file?

rega= rds,

Wim


--90e6ba614788d11d1304fd097228--