Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A2F32319 for ; Sun, 1 May 2011 18:49:09 +0000 (UTC) Received: (qmail 27543 invoked by uid 500); 1 May 2011 18:49:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 27523 invoked by uid 500); 1 May 2011 18:49:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27515 invoked by uid 99); 1 May 2011 18:49:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 May 2011 18:49:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jakers@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 May 2011 18:49:01 +0000 Received: by yxk30 with SMTP id 30so1983125yxk.31 for ; Sun, 01 May 2011 11:48:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=39hYurHHrGFuI48ep4dYA+ctW7UFtvFbPsWhV7+BqIs=; b=GabyU07Zgt2dosNbcbvvAjzT9mP5JqYKPdR5AobSfQcq5+hDZp9kJIi9qJpYCAPUCA IQRE0qA+aBMf2HCSP0QGGStfhp1WXd0A257xFgZfBIB+9pOnTBWVD5Bq0ymc6GUGtbpT FQpJXGJApF8fpwmBYgmoU5mDrkRe8v2BALeHc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=mXYwk2uPOTPiwGQYmqlvdz/QTYlqthVwdlajB64D9b3chnuRA3NCazLQg5LWefawe3 K5xHXMu4zL7B1kqCf/L1P9elmV07LIABecZi1l/yQ3N2IuNAgtZahEudfjoU/HI6LM4p jsbv4B0wApHzRj9eM6pEBXrFpsVAJEZf3ARQU= MIME-Version: 1.0 Received: by 10.236.187.6 with SMTP id x6mr9082320yhm.370.1304275718010; Sun, 01 May 2011 11:48:38 -0700 (PDT) Received: by 10.236.110.165 with HTTP; Sun, 1 May 2011 11:48:37 -0700 (PDT) In-Reply-To: References: Date: Sun, 1 May 2011 14:48:37 -0400 Message-ID: Subject: Re: Combining all CFs into one big one From: Jake Luciani To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3040ec56dc77c904a23b5a8f X-Virus-Checked: Checked by ClamAV on apache.org --20cf3040ec56dc77c904a23b5a8f Content-Type: text/plain; charset=ISO-8859-1 If you have N column families you need N * memtable size of RAM to support this. If that's not an option you can merge them into one as you suggest but then you will have much larger SSTables, slower compactions, etc. I don't necessarily agree with Tyler that the OS cache will be less effective... But I do agree that if the sizes of sstables are too large for you then more hardware is the solution... On Sun, May 1, 2011 at 1:24 PM, Tyler Hobbs wrote: > When you have a high number of CFs, it's a good idea to consider merging > CFs with highly correlated access patterns and similar structure into one. > It is *not* a good idea to merge all of your CFs into one (unless they all > happen to meet this criteria). Here's why: > > Besides big compactions and long repairs that you can't break down into > smaller pieces, the main problem here is that your caching will become much > less efficient. The OS buffer cache will be less effective because rows from > all of the CFs will be interspersed in the SSTables. You will no longer be > able to tune the key or row cache to only cache frequently accessed data. > Both of these will tend to cause a serious increase in latency for your hot > data. > >> Shouldn't these kinds of problems be solved by Cassandra? >> > They are mainly solved by Cassandra's general solution to any performance > problem: the addition of more nodes. There are tickets open to improve > compaction strategies, put bounds on SSTable sizes, etc; for example, > https://issues.apache.org/jira/browse/CASSANDRA-1608 , but the addition of > more nodes is a reliable solution to problems of this nature. > > On Sun, May 1, 2011 at 7:28 AM, David Boxenhorn wrote: > >> Shouldn't these kinds of problems be solved by Cassandra? Isn't there a >> maximum SSTable size? >> >> On Sun, May 1, 2011 at 3:24 PM, shimi wrote: >> >>> Big sstables, long compactions, in major compaction you will need to have >>> free disk space in the size of all the sstables (which you should have >>> anyway). >>> >>> Shimi >>> >>> >>> On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn wrote: >>> >>>> I'm having problems administering my cluster because I have too many CFs >>>> (~40). >>>> >>>> I'm thinking of combining them all into one big CF. I would prefix the >>>> current CF name to the keys, repeat the CF name in a column, and index the >>>> column (so I can loop over all rows, which I have to do sometimes, for some >>>> CFs). >>>> >>>> Can anyone think of any disadvantages to this approach? >>>> >>>> >>> >> > > > -- > Tyler Hobbs > Software Engineer, DataStax > Maintainer of the pycassa Cassandra > Python client library > > -- http://twitter.com/tjake --20cf3040ec56dc77c904a23b5a8f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If you have N column families you need N * memtable size of RAM to support = this. =A0If that's not an option you can merge them into one as you sug= gest but then you will have much larger SSTables, slower compactions, etc. = =A0I don't=A0necessarily=A0agree with Tyler that the OS cache will be l= ess effective... But I do agree that if the sizes of sstables are too large= for you then more hardware is the solution...

On Sun, May 1, 2011 at 1:24 PM, Tyler Hobbs = <tyler@datastax.= com> wrote:
When you have a high number of CFs,=20 it's a good idea to consider merging CFs with highly correlated access= =20 patterns and similar structure into one. It is *not* a good idea to=20 merge all of your CFs into one (unless they all happen to meet this=20 criteria). Here's why:

Besides big compactions and long repairs that you can't break down= =20 into smaller pieces, the main problem here is that your caching will=20 become much less efficient. The OS buffer cache will be less effective=20 because rows from all of the CFs will be interspersed in the SSTables. =20 You will no longer be able to tune the key or row cache to only cache=20 frequently accessed data. Both of these will tend to cause a serious=20 increase in latency for your hot data.

Shouldn't these = kinds of problems be solved by=20 Cassandra?

They are mainly solved by Cassandra&#= 39;s general solution to=20 any performance problem: the addition of more nodes. There are tickets=20 open to improve compaction strategies, put bounds on SSTable sizes, etc; for example, https://issues.apache.org/jira/browse/CASSANDRA-1608<= /a> , but the addition of more nodes is a reliable solution to problems of = this nature.

On S= un, May 1, 2011 at 7:28 AM, David Boxenhorn <david@taotown.com> wrote:
Shouldn't these kinds of problems be solved by Cassand= ra? Isn't there a maximum SSTable size?

On Sun, May 1, 2011 at 3:24 PM, shimi <shimi.k@gm= ail.com> wrote:
Big ss= tables, long compactions, in major compaction you will need to have free di= sk space in the size of all the sstables (which you should have anyway).
Shimi

On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn <david@taotown.com>= wrote:
I'm having problems administering my cluster because I= have too many CFs (~40).

I'm thinking of combining them all int= o one big CF. I would prefix the current CF name to the keys, repeat the CF= name in a column, and index the column (so I can loop over all rows, which= I have to do sometimes, for some CFs).

Can anyone think of any disadvantages to this approach?






--
Tyler Hobbs
Software Engineer, DataS= tax
Maintainer of the pycassa Cassandra Python client library



--
http://twitter.com/tjake
--20cf3040ec56dc77c904a23b5a8f--