Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 87708 invoked from network); 25 Feb 2010 06:57:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Feb 2010 06:57:25 -0000 Received: (qmail 71767 invoked by uid 500); 25 Feb 2010 06:57:24 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 71681 invoked by uid 500); 25 Feb 2010 06:57:23 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 71673 invoked by uid 99); 25 Feb 2010 06:57:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Feb 2010 06:57:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of weijunli@gmail.com designates 209.85.210.187 as permitted sender) Received: from [209.85.210.187] (HELO mail-yx0-f187.google.com) (209.85.210.187) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Feb 2010 06:57:15 +0000 Received: by yxe17 with SMTP id 17so1045665yxe.32 for ; Wed, 24 Feb 2010 22:56:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:references :in-reply-to:subject:date:message-id:mime-version:content-type :x-mailer:thread-index:content-language; bh=5fBl8WIbYD4S0z2JoDkY7uqzN3dAs/TeOSMXZs8k4Zg=; b=kIu1YDDFDWKDh+ihkBSHiRadPHv47YYNIdX1Lt12Yh/vqz2I/mkS2Du2RDY0B1koXC FPAzivctb8+le4XRRKFwkmp5EOXMjXWTbJa4pHck1EXcteA39oUig9aaHLbr4wb/IRfA i/6SvzF7qjgW/X51Zgoy13ZEASe10QxzAWXe0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-type:x-mailer:thread-index:content-language; b=hEefOnunkPi33a1QBJM6txp5oG3n42aAQ88+MlBSznshvmG7Q1kyFWoCSp250xDpct EthTh761reo3+bPo8MKGlhZTUbsHqShD3Fbwp31QSTsq5z7iJv1ZMX8yQrYE5/4BupCH uo25E1KTdgqyzZEyYEQnCdOIMYdmtP4M/Qq2Y= Received: by 10.101.128.25 with SMTP id f25mr1043895ann.95.1267081014614; Wed, 24 Feb 2010 22:56:54 -0800 (PST) Received: from WaynePC (173-11-95-78-SFBA.hfc.comcastbusiness.net [173.11.95.78]) by mx.google.com with ESMTPS id 6sm1104188yxg.12.2010.02.24.22.56.52 (version=SSLv3 cipher=RC4-MD5); Wed, 24 Feb 2010 22:56:53 -0800 (PST) From: "Weijun Li" To: Cc: References: <4b8376b4.0603c00a.0ba2.739f@mx.google.com> <022f01cab463$3a2cd8a0$ae8689e0$@com> In-Reply-To: Subject: RE: Strategy to delete/expire keys in cassandra Date: Wed, 24 Feb 2010 22:56:50 -0800 Message-ID: <007601cab5e7$b5e1fe00$21a5fa00$@com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0077_01CAB5A4.A7BEBE00" X-Mailer: Microsoft Office Outlook 12.0 thread-index: Acq094jjTW5Qfs1vSS60MnDhU36otwA7rZ2Q Content-Language: en-us This is a multi-part message in MIME format. ------=_NextPart_000_0077_01CAB5A4.A7BEBE00 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Sylvain, I just noticed that you are the one that implemented the Expiring Column feature. Could you please help on my questions? Should I just run command (in Cassandra 0.5 source folder?) like: patch -p1 -i 0001-Add-new-ExpiringColumn-class.patch for all of the five patches in your ticket? Also what's your opinion on extending ExpiringColumn to expire a key completely? Otherwise it will be difficult to track what are expired or old rows in Cassandra. Thanks, -Weijun From: Weijun Li [mailto:weijunli@gmail.com] Sent: Tuesday, February 23, 2010 6:18 PM To: cassandra-user@incubator.apache.org Subject: Re: Strategy to delete/expire keys in cassandra Thanks for the answer. A dumb question: how did you apply the patch file to 0.5 source? The link you gave doesn't mention that the patch is for 0.5?? Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning the number of keys will keep grow (even if you drop columns for them) unless you delete them. In your case, how do you manage deleting/expiring keys from Cassandra? Do you keep a list of keys somewhere and go through them once a while? Thanks, -Weijun On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne wrote: Hi, Maybe the following ticket/patch may be what you are looking for: https://issues.apache.org/jira/browse/CASSANDRA-699 It's flagged for 0.7 but as it breaks the API (and if I understand correctly the release plan) it may not make it in cassandra before 0.8 (and the patch will have to change to accommodate the change that will be made to the internals in 0.7). Anyway, what I can at least tell you is that I'm using the patch against 0.5 in a test cluster without problem so far. > 3) Once keys are deleted, do you have to wait till next GC to clean > them from disk or memory (suppose you don't run cleanup manually)? What's > the strategy for Cassandra to handle deleted items (notify other replica > nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild bloom filter > etc). I'm asking this because if the keys refresh very fast (i.e., high > volume write/read and expiration is kind of short) how will the data file > grow and how does this impact the system performance. Items are deleted only during compaction, and you may actually have to wait for the GCGraceSeconds before deletion. This value is configurable in storage-conf.xml, but is 10 days by default. You can decrease this value but because of consistency (and the fact that you have to at least wait for compaction to occurs) you will always have a delay before the actual delete (all this is also true for the patch I mention above by the way). But when it's deleted, it's just skipping the items during compaction, so it's really cheap. -- Sylvain ------=_NextPart_000_0077_01CAB5A4.A7BEBE00 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Sylvain, I just noticed that you are the one that = implemented the Expiring Column feature. Could you please help on my = questions?

 

Should I just run command (in Cassandra 0.5 source = folder?) like:

 

patch –p1 –i =  0001-Add-new-ExpiringColumn-class.patch

 

for all of the five patches in your = ticket?

 

Also what’s your opinion on extending = ExpiringColumn to expire a key completely? Otherwise it will be difficult to track what = are expired or old rows in Cassandra.

 

Thanks,

-Weijun

 

From:= Weijun Li [mailto:weijunli@gmail.com]
Sent: Tuesday, February 23, 2010 6:18 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Strategy to delete/expire keys in = cassandra

 

Thanks for the = answer.  A dumb question: how did you apply the patch file to 0.5 source? The link = you gave doesn't mention that the patch is for 0.5??

Also, this ExpiringColumn feature doesn't seem to expire key/row, = meaning the number of keys will keep grow (even if you drop columns for them) unless = you delete them. In your case, how do you manage deleting/expiring keys from Cassandra? Do you keep a list of keys somewhere and go through them once = a while?

Thanks,

-Weijun

On Tue, Feb 23, 2010 at 2:26 AM, Sylvain Lebresne = <sylvain@yakaz.com> = wrote:

Hi,

Maybe the following ticket/patch may be what you are looking for:
https://issues.apache.org/jira/browse/CASSANDRA-699=

It's flagged for 0.7 but as it breaks the API (and if I understand = correctly
the release plan) it may not make it in cassandra before 0.8 (and = the
patch will have to change to accommodate the change that will be
made to the internals in 0.7).

Anyway, what I can at least tell you is that I'm using the patch = against
0.5 in a test cluster without problem so far.


> 3)      Once keys are deleted, do you have = to wait till next GC to clean
> them from disk or memory (suppose you don’t run cleanup = manually)? What’s
> the strategy for Cassandra to handle deleted items (notify other = replica
> nodes, cleanup memory/disk, defrag/rebuild disk files, rebuild = bloom filter
> etc). I’m asking this because if the keys refresh very fast = (i.e., high
> volume write/read and expiration is kind of short) how will the = data file
> grow and how does this impact the system = performance.

Items are deleted only during compaction, and you = may actually have to
wait for the GCGraceSeconds before deletion. This value is configurable = in
storage-conf.xml, but is 10 days by default. You can decrease this = value
but because of consistency (and the fact that you have to at least wait = for
compaction to occurs) you will always have a delay before the actual = delete
(all this is also true for the patch I mention above by the way). But = when it's
deleted, it's just skipping the items during compaction, so it's really = cheap.

--
Sylvain

 

------=_NextPart_000_0077_01CAB5A4.A7BEBE00--