Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F0C8F92B8 for ; Wed, 23 May 2012 10:10:04 +0000 (UTC) Received: (qmail 71811 invoked by uid 500); 23 May 2012 10:10:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 71500 invoked by uid 500); 23 May 2012 10:10:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 71445 invoked by uid 99); 23 May 2012 10:09:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 10:09:58 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a42.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 10:09:53 +0000 Received: from homiemail-a42.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a42.g.dreamhost.com (Postfix) with ESMTP id BEF0768C05D for ; Wed, 23 May 2012 03:09:31 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=F6ZGuAw5v4 VFl2kV4owRIhAQNcgF5b65A5xPnTmC0JlVxY67CgPK3ps9wKUT0t8+JsCgyJcNKv 05dcvcvmeKVQ1Uq3CbIdYFX+9t2x4M8FQysqB203xJkVtzjl7BMkVHH+Rx/+Jjzg 4PyrNUjH3EDoQUjedNGa/V4jdTkZGcVrg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=SjE+TPARjsMsxYhi bMGN/iGEAEw=; b=Z7mXP3LL0g4dSLltLOQD3zSonO0WGAj5JJJf77LkiDO20Kax bP5U8cRr8AmFAVe4e+ScqxNuvZ1Nr57fuKRgg5euHTcfhsqGwlp0rTztmbzVcIU7 2jHwUVE3NLWC/+Z9sBLNkcYkD8rq76+3AMbB4Td5EZtVF1BfR5m6mYc7wQM= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a42.g.dreamhost.com (Postfix) with ESMTPSA id 037B668C058 for ; Wed, 23 May 2012 03:09:30 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_DF359477-610F-4395-88D2-FDBAAB8BFF9B" Subject: Re: Number of keyspaces Date: Wed, 23 May 2012 22:09:29 +1200 In-Reply-To: To: user@cassandra.apache.org References: <8F37CFD0-F5B8-4F07-AE3B-9AB1ED8C8DA3@gmail.com> <9A5F91FD-CA77-4965-92CA-B3AB9B795CE1@thelastpickle.com> <9E2E7B16-BE39-4A25-A6A8-AAC6EA8C3C28@gmail.com> <62286460-8D82-4766-AB28-692E97B0BA25@thelastpickle.com> Message-Id: <71E3A34F-5B42-4137-A7A0-0B44EE9AC151@thelastpickle.com> X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_DF359477-610F-4395-88D2-FDBAAB8BFF9B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > We were thinking of doing a major compaction after each year is = 'closed off'.=20 Not a terrible idea. Years tend to happen annually, so their growth = pattern is well understood.=20 > This would mean that compactions for the current year were dealing = with a smaller amount of data and hence be faster and have less impact = on a day-to-day basis. Older data is compacted into higher tiers / generations so will not be = included when compacting new data (background = http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra). = That said, there is a chance that at some point you the big older files = get compacted. i.e. if you get (by default) 4 X 100GB files they will = get compacted into 1.=20 It feels a bit like a premature optimisation.=20 =20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/05/2012, at 1:52 PM, Franc Carter wrote: > On Wed, May 23, 2012 at 7:42 AM, aaron morton = wrote: > 1 KS with 24 CF's will use roughly the same resources as 24 KS's with = 1 CF. Each CF: >=20 > * loads the bloom filter for each SSTable > * samples the index for each sstable > * uses row and key cache > * has a current memtable and potentially memtables waiting to flush. > * had secondary index CF's >=20 > I would generally avoid a data model that calls for CF's to be added = in response to new entities or new data. Older data will move moved to = larger files, and not included in compaction for newer data. >=20 > We were thinking of doing a major compaction after each year is = 'closed off'. This would mean that compactions for the current year were = dealing with a smaller amount of data and hence be faster and have less = impact on a day-to-day basis. Our query patterns will only infrequently = cross year boundaries. >=20 > Are we being naive ? >=20 > cheers > =20 >=20 > Hope that helps.=20 >=20 > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 23/05/2012, at 3:31 AM, Lu=EDs Ferreira wrote: >=20 >> I have 24 keyspaces, each with a columns family and am considering = changing it to 1 keyspace with 24 CFs. Would this be beneficial? >> On May 22, 2012, at 12:56 PM, samal wrote: >>=20 >>> Not ideally, now cass has global memtable tuning. Each cf correspond = to memory in ram. Year wise cf means it will be in read only state for = next year, memtable will still consume ram. >>>=20 >>> On 22-May-2012 5:01 PM, "Franc Carter" = wrote: >>> On Tue, May 22, 2012 at 9:19 PM, aaron morton = wrote: >>> It's more the number of CF's than keyspaces. >>>=20 >>> Oh - does increasing the number of Column Families affect = performance ? >>>=20 >>> The design we are working on at the moment is considering using a = Column Family per year. We were thinking this would isolate compactions = to a more manageable size as we don't update previous years. >>>=20 >>> cheers >>> =20 >>>=20 >>> Cheers >>>=20 >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>>=20 >>> On 22/05/2012, at 6:58 PM, R. Verlangen wrote: >>>=20 >>>> Yes, it does. However there's no real answer what's the limit: it = depends on your hardware and cluster configuration.=20 >>>>=20 >>>> You might even want to search the archives of this mailinglist, I = remember this has been asked before. >>>>=20 >>>> Cheers! >>>>=20 >>>> 2012/5/21 Lu=EDs Ferreira >>>> Hi, >>>>=20 >>>> Does the number of keyspaces affect the overall cassandra = performance? >>>>=20 >>>>=20 >>>> Cumprimentos, >>>> Lu=EDs Ferreira >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> --=20 >>>> With kind regards, >>>>=20 >>>> Robin Verlangen >>>> www.robinverlangen.nl >>>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>> --=20 >>> Franc Carter | Systems architect | Sirca Ltd >>> franc.carter@sirca.org.au | www.sirca.org.au >>> Tel: +61 2 9236 9118=20 >>> Level 9, 80 Clarence St, Sydney NSW 2000 >>> PO Box H58, Australia Square, Sydney NSW 1215 >>>=20 >>=20 >> Cumprimentos, >> Lu=EDs Ferreira >>=20 >>=20 >>=20 >=20 >=20 >=20 >=20 > --=20 > Franc Carter | Systems architect | Sirca Ltd > franc.carter@sirca.org.au | www.sirca.org.au > Tel: +61 2 9236 9118=20 > Level 9, 80 Clarence St, Sydney NSW 2000 > PO Box H58, Australia Square, Sydney NSW 1215 >=20 --Apple-Mail=_DF359477-610F-4395-88D2-FDBAAB8BFF9B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
We were = thinking of doing a major compaction after each year is 'closed = off'. 
Not a terrible idea. Years tend to = happen annually, so their growth pattern is well = understood. 

This would mean that compactions for the = current year were dealing with a smaller amount of data and hence be = faster and have less impact on a day-to-day = basis.
Older data is compacted into higher tiers = / generations so will not be included when compacting new data = (background http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassa= ndra). That said, there is a chance that at some point you the = big older files get compacted. i.e. if you get (by default) 4 X 100GB = files they will get compacted into 1. 

It = feels a bit like a premature optimisation. 
 
http://www.thelastpickle.com

On 23/05/2012, at 1:52 PM, Franc Carter wrote:

On Wed, = May 23, 2012 at 7:42 AM, aaron morton <aaron@thelastpickle.com> wrote:
1 KS with 24 CF's will use roughly = the same resources as 24 KS's with 1 CF. Each CF:

* = loads the bloom filter for each SSTable
* samples the index = for each sstable
* uses row and key cache
* has a current memtable and = potentially memtables waiting to flush.
* had secondary = index CF's

I would generally avoid a data model = that calls for CF's to be added in response to new entities or new data. = Older data will move moved to larger files, and not included in = compaction for newer data.

We were thinking of doing a = major compaction after each year is 'closed off'. This would mean that = compactions for the current year were dealing with a smaller amount of = data and hence be faster and have less impact on a day-to-day basis. Our = query patterns will only infrequently cross year boundaries.

Are we being naive ?

cheers
 

Hope that helps. 

-----------------
Aaron Morton
Freelance = Developer
@aaronmorton

On 23/05/2012, at 3:31 AM, = Lu=EDs Ferreira wrote:

I have 24 keyspaces, each with a columns = family and am considering changing it to 1 keyspace with 24 CFs. Would = this be beneficial?
On May 22, 2012, at 12:56 PM, samal = wrote:

Not ideally, now cass has = global memtable tuning. Each cf correspond to memory  in ram. Year = wise cf means it will be in read only state for next year, = memtable  will still consume ram.

On 22-May-2012 5:01 PM, "Franc Carter" <franc.carter@sirca.org.au> wrote:
On Tue, May 22, 2012 at 9:19 PM, aaron morton <aaron@thelastpickle.com> wrote:
It's more the number of CF's than = keyspaces.

Oh - does increasing = the number of Column Families affect performance = ?

The design we are working on at the moment is considering using a Column = Family per year. We were thinking this would = isolate compactions to a more manageable size as we don't = update previous years.

cheers
 

Cheers

-----------------
Aaron Morton
Freelance = Developer
@aaronmorton

On 22/05/2012, at 6:58 PM, R. Verlangen = wrote:

Yes, it does. However there's = no real answer what's the limit: it depends on your hardware and cluster = configuration. 

You might even want to search the archives of this = mailinglist, I remember this has been asked before.

Cheers!

2012/5/21 = Lu=EDs Ferreira <zamith.28@gmail.com>
Hi,

Does the number of keyspaces affect the overall cassandra = performance?


Cumprimentos,
Lu=EDs Ferreira






--
With kind = regards,

Robin Verlangen

=




--
Franc Carter | Systems architect | = Sirca = Ltd
Level 9, 80 Clarence St, = Sydney NSW 2000
PO Box H58, Australia Square, = Sydney NSW 1215


Cumprimentos,
Lu=EDs = Ferreira


=




--
Franc Carter = | Systems = architect | Sirca = Ltd
Level 9, 80 Clarence St, = Sydney NSW 2000
PO Box H58, Australia Square, = Sydney NSW 1215


= --Apple-Mail=_DF359477-610F-4395-88D2-FDBAAB8BFF9B--