Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7A5E2E81F for ; Wed, 13 Feb 2013 10:09:33 +0000 (UTC) Received: (qmail 59890 invoked by uid 500); 13 Feb 2013 10:09:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59585 invoked by uid 500); 13 Feb 2013 10:09:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59526 invoked by uid 99); 13 Feb 2013 10:09:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Feb 2013 10:09:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ilya@metricshub.com designates 209.85.210.52 as permitted sender) Received: from [209.85.210.52] (HELO mail-da0-f52.google.com) (209.85.210.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Feb 2013 10:09:20 +0000 Received: by mail-da0-f52.google.com with SMTP id f10so472912dak.25 for ; Wed, 13 Feb 2013 02:08:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:subject:date:message-id:mime-version :content-type:x-mailer:thread-index:content-language :x-gm-message-state; bh=mDqfVv7rqguIam/NfKu5YZ6rh6pIX4GGzbBJ05A4Gk0=; b=pCey3+2ORWD78Z5TeaZ+Du+ABY4pWomnD+P1V3SaRskfC+qaopS/zBY6ujcQP6Qfn3 Dto3vDkfgg6a8w7cuyWml9OT2GF9T3yGxV6qDVNPGjaBR+SEa1EV32x+LGRM8VllBeOW Dal6efyVS0SgKCAnLcfT2RqzsZyL58bUvdemN+wUe0gDKgjBqdTCHdxElwhYbI64Ucf2 35a4NSYn1w0Zp46xgPRAjzDfV33/h/YGUHNyfLIeceBH+lWEYlD60GGrx464JeNTEBUA +j3tFJzHLBwww70Hu7GRbdR20hh9w6pH47MVAZj5462ke5dVZjM+QPxIAl24P24QjTKI o6eA== X-Received: by 10.66.85.101 with SMTP id g5mr62430111paz.17.1360750138973; Wed, 13 Feb 2013 02:08:58 -0800 (PST) Received: from ilyadev (c-76-104-163-173.hsd1.wa.comcast.net. [76.104.163.173]) by mx.google.com with ESMTPS id z10sm82316015pay.7.2013.02.13.02.08.57 (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 13 Feb 2013 02:08:58 -0800 (PST) From: "Ilya Grebnov" To: Subject: Deleting old items during compaction (WAS: Deleting old items) Date: Wed, 13 Feb 2013 02:08:57 -0800 Message-ID: <01c101ce09d2$237a1ab0$6a6e5010$@metricshub.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_01C2_01CE098F.1556DAB0" X-Mailer: Microsoft Outlook 14.0 Thread-Index: Ac4J0VcZIdF0LWS3SQ+mcQCLa+3qyA== Content-Language: en-us X-Gm-Message-State: ALoCoQlQmhAJkpFhfjJuaBkesPx6cq8R2E438HdWpfvp1+F/Cry94PdQEqCDZxHc++GH9IrqV11M X-Virus-Checked: Checked by ClamAV on apache.org This is a multipart message in MIME format. ------=_NextPart_000_01C2_01CE098F.1556DAB0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, We looking for solution for same problem. We have a wide column family with counters and we want to delete old data like 1 months old. One of potential ideas was to implement hook in compaction code and drop column which we don't need. Is this a viable option? Thanks, Ilya From: aaron morton [mailto:aaron@thelastpickle.com] Sent: Tuesday, February 12, 2013 9:01 AM To: user@cassandra.apache.org Subject: Re: Deleting old items So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ? No. You need to issue row level deletes. If you don't know the row key you'll need to do range scans to locate them. If you are deleting parts of wide rows consider reducing the min_compaction_level_threshold on the CF to 2 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ wrote: Hi, I would like to know if there is a way to delete old/unused data easily ? I know about TTL but there are 2 limitations of TTL: - AFAIK, there is no TTL on counter columns - TTL need to be defined at write time, so it's too late for data already inserted. I also could use a standard "delete" but it seems inappropriate for such a massive. In some cases, I don't know the row key and would like to delete all the rows starting by, let's say, "1050#..." Even better, I understood that columns are always inserted in C* with (name, value, timestamp). So is it possible to delete all the data inserted in some CF between 2 dates or data older than 1 month ? Alain ------=_NextPart_000_01C2_01CE098F.1556DAB0 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 

We looking for solution for same problem. We have a wide column = family with counters and we want to delete old data like 1 months old. = One of potential ideas was to implement hook in compaction code and drop = column which we don’t need. Is this a viable = option?

 

Thanks,

Ilya

From:= = aaron morton [mailto:aaron@thelastpickle.com]
Sent: Tuesday, = February 12, 2013 9:01 AM
To: = user@cassandra.apache.org
Subject: Re: Deleting old = items

 

So is it possible to delete all the data inserted in = some CF between 2 dates or data older than 1 month = ?

No. 

 

You need to issue row level deletes. If you don't know = the row key you'll need to do range scans to locate = them. 

 

If you are deleting parts of wide rows consider = reducing the min_compaction_level_threshold on the CF to = 2

 

Cheers

 

 

-----------------

Aaron Morton

Freelance Cassandra Developer

New Zealand

 

@aaronmorton

 

On = 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <arodrime@gmail.com> = wrote:



Hi,

 

I = would like to know if there is a way to delete old/unused data easily = ?

 

I = know about TTL but there are 2 limitations of = TTL:

 

- = AFAIK, there is no TTL on counter columns

- TTL need to be defined at write time, so it's too = late for data already inserted.

 

I = also could use a standard "delete" but it = seems inappropriate for such a massive.

 

In some cases, I don't know the row key and would like = to delete all the rows starting by, let's say, = "1050#..." 

 

Even better, I understood that columns are always = inserted in C* with (name, value, timestamp). So is it possible to = delete all the data inserted in some CF between 2 dates or data older = than 1 month ?

 

Alain

 

------=_NextPart_000_01C2_01CE098F.1556DAB0--