Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7A7ACEAF8 for ; Sun, 17 Feb 2013 20:12:37 +0000 (UTC) Received: (qmail 9814 invoked by uid 500); 17 Feb 2013 20:12:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 9776 invoked by uid 500); 17 Feb 2013 20:12:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 9767 invoked by uid 99); 17 Feb 2013 20:12:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Feb 2013 20:12:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a93.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Feb 2013 20:12:29 +0000 Received: from homiemail-a93.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a93.g.dreamhost.com (Postfix) with ESMTP id 7B29F84058 for ; Sun, 17 Feb 2013 12:12:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=GngSlEwGyPR2/otcV3D+e2lLBj Q=; b=JkxKXFKOJMMxl4l7DVpmG5Ulz5pFE/+Xwdksax0JKJpp7bLgNmzyeVLsq8 Dyb8qG6+4W8yAm+4nWxBTHF0p7ndqgYKoFjKCqq1Jf2QE+c9yV4Q/ORYu2FS+/5F LmhxGr4vf6i474pGWmQ5CMUccTDBxs7lKV1HgJH0OzLxCAyTo= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a93.g.dreamhost.com (Postfix) with ESMTPSA id B3E7B8405B for ; Sun, 17 Feb 2013 12:12:07 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_94C6F1B8-9160-42E4-8DA1-8414384B6139" Message-Id: <8D6B39DD-FF09-4DC0-AA87-8822D272A6E6@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Deleting old items Date: Mon, 18 Feb 2013 09:12:14 +1300 References: <62D70453-0B22-4487-BAA9-95A5BCD4E971@thelastpickle.com> <043E944E-BCD9-4F79-AEC3-EECAB66E03B7@thelastpickle.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_94C6F1B8-9160-42E4-8DA1-8414384B6139 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 I'll email the docs people.=20 I believe they are saying "use compaction throttling rather than this" = not "this does nothing" Although I used this in the last month on a machine with very little ram = to limit compaction memory use. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/02/2013, at 7:05 AM, Alain RODRIGUEZ wrote: > "Can you point to the docs." >=20 > = http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-c= ompaction-threshold >=20 > And thanks about the rest of your answers, once again ;-). >=20 > Alain >=20 >=20 > 2013/2/16 aaron morton >> Is that a feature that could possibly be developed one day ? > No.=20 > Timestamps are essentially internal implementation used to resolve = different values for the same column.=20 >=20 >> With "min_compaction_level_threshold" did you mean = "min_compaction_threshold" ? If so, why should I do that, what are the = advantage/inconvenient of reducing this value ? >=20 > Yes, min_compaction_threshold, my bad.=20 > If you have a wide row and delete a lot of values you will end up with = a lot of tombstones. These may dramatically reduce the read performance = until they are purged. Reducing the compaction threshold makes = compaction happen more frequently.=20 >=20 >> Looking at the doc I saw that: "max_compaction_threshold: Ignored in = Cassandra 1.1 and later.". How to ensure that I'll always keep a small = amount of SSTables then ? > AFAIK it's not.=20 > There may be some confusion about the location of the settings in CLI = vs CQL.=20 > Can you point to the docs.=20 >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ = wrote: >=20 >> Hi Aaron, once again thanks for this answer. >>> "So is it possible to delete all the data inserted in some CF = between 2 dates or data older than 1 month ?" >> "No. " >>=20 >> Why is there no way of deleting or getting data using the internal = timestamp stored alongside of any inserted column (as described here: = http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? = Is that a feature that could possibly be developed one day ? It could be = useful to perform delete of old data or to bring to a dev cluster just = the last week of data for example. >>=20 >> With "min_compaction_level_threshold" did you mean = "min_compaction_threshold" ? If so, why should I do that, what are the = advantage/inconvenient of reducing this value ? >>=20 >> Looking at the doc I saw that: "max_compaction_threshold: Ignored in = Cassandra 1.1 and later.". How to ensure that I'll always keep a small = amount of SSTables then ? Why is this deprecated ? >>=20 >> Alain >>=20 >>=20 >> 2013/2/12 aaron morton >>> So is it possible to delete all the data inserted in some CF between = 2 dates or data older than 1 month ? >> No.=20 >>=20 >> You need to issue row level deletes. If you don't know the row key = you'll need to do range scans to locate them.=20 >>=20 >> If you are deleting parts of wide rows consider reducing the = min_compaction_level_threshold on the CF to 2 >>=20 >> Cheers >>=20 >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >>=20 >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ = wrote: >>=20 >>> Hi, >>>=20 >>> I would like to know if there is a way to delete old/unused data = easily ? >>>=20 >>> I know about TTL but there are 2 limitations of TTL: >>>=20 >>> - AFAIK, there is no TTL on counter columns >>> - TTL need to be defined at write time, so it's too late for data = already inserted. >>>=20 >>> I also could use a standard "delete" but it seems inappropriate for = such a massive. >>>=20 >>> In some cases, I don't know the row key and would like to delete all = the rows starting by, let's say, "1050#..."=20 >>>=20 >>> Even better, I understood that columns are always inserted in C* = with (name, value, timestamp). So is it possible to delete all the data = inserted in some CF between 2 dates or data older than 1 month ? >>>=20 >>> Alain >>=20 >>=20 >=20 >=20 --Apple-Mail=_94C6F1B8-9160-42E4-8DA1-8414384B6139 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 I'll = email the docs people. 

I believe they are = saying "use compaction throttling rather than this" not "this does = nothing"

Although I used this in the last month = on a machine with very little ram to limit compaction memory = use.

Cheers

http://www.thelastpickle.com

On 17/02/2013, at 7:05 AM, Alain RODRIGUEZ <arodrime@gmail.com> = wrote:

"Can you point to = the docs."


And thanks about the rest of your = answers, once again ;-).

Alain


2013/2/16 aaron morton <aaron@thelastpickle.com>
 Is that a feature that could = possibly be developed one day ?
No. 
Timestamps are essentially internal = implementation used to resolve different values for the same = column. 

With = "min_compaction_level_threshold" did you mean "min_compaction_thres= hold" =  ? If so, why should I do that, what are the = advantage/inconvenient of reducing this value ?
Yes, min_compaction_threshold, = my bad. 
If you have a wide row and = delete a lot of values you will end up with a lot of tombstones. These = may dramatically reduce the read performance until they are purged. = Reducing the compaction threshold makes compaction happen more = frequently. 

Looking at the doc I saw that: "max_compaction_thres= hold: Ignored in Cassandra 1.1 and later.". How to ensure that = I'll always keep a small amount of SSTables then ?
AFAIK it's not. 
There may = be some confusion about the location of the settings in CLI vs = CQL. 
Can you point to the = docs. 

Cheers

-----------------
Aaron Morton
Freelance = Cassandra Developer
New = Zealand

@aaronmorton

On 13/02/2013, at 10:14 PM, = Alain RODRIGUEZ <arodrime@gmail.com> wrote:

Hi Aaron, once again thanks for this answer.
"So is it possible to delete all the data inserted in = some CF between 2 dates or data older than 1 month = ?"
"No. "

Why is there no way = of deleting or getting data using the internal timestamp stored = alongside of any inserted column (as described here: http://www.datastax.com/docs/1.1/ddl/column_family#stand= ard-columns) ? Is that a feature that could possibly = be developed one day ? It could be useful to perform = delete of old data or to bring to a dev cluster just the last week of = data for example.

With = "min_compaction_level_threshold" did you mean "min_compaction_thres= hold" =  ? If so, why should I do that, what are the = advantage/inconvenient of reducing this value ?

Looking at the doc I saw that: "max_compaction_thres= hold: Ignored in Cassandra 1.1 and later.". How to ensure that = I'll always keep a small amount of SSTables then ? Why is this = deprecated ?

Alain


2013/2/12 aaron = morton <aaron@thelastpickle.com>
So is it possible to delete all the data inserted in some CF = between 2 dates or data older than 1 month ?
No. 

You need to issue row = level deletes. If you don't know the row key you'll need to do range = scans to locate them. 

If you are deleting = parts of wide rows consider reducing the min_compaction_level_threshold = on the CF to 2

Cheers


-----------------
Aaron Morton
Freelance = Cassandra Developer
New = Zealand

@aaronmorton

On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

Hi,

I would like to = know if there is a way to delete old/unused data easily ?

I know about TTL but there are 2 limitations of = TTL:

- AFAIK, there is no TTL on counter columns
- = TTL need to be defined at write time, so it's too late for data already = inserted.

I also could use a standard "delete" = but it seems inappropriate for such a massive.

In some cases, I don't know the row key and would = like to delete all the rows starting by, let's say, = "1050#..." 

Even better, I understood = that columns are always inserted in C* with (name, value, timestamp). So = is it possible to delete all the data inserted in some CF between 2 = dates or data older than 1 month ?

Alain


=



= --Apple-Mail=_94C6F1B8-9160-42E4-8DA1-8414384B6139--