Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 56010 invoked from network); 6 Feb 2011 04:27:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Feb 2011 04:27:03 -0000 Received: (qmail 24129 invoked by uid 500); 6 Feb 2011 04:27:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23952 invoked by uid 500); 6 Feb 2011 04:26:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23944 invoked by uid 99); 6 Feb 2011 04:26:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Feb 2011 04:26:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rajkumar.w93@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Feb 2011 04:26:49 +0000 Received: by fxm9 with SMTP id 9so4025279fxm.31 for ; Sat, 05 Feb 2011 20:26:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=KHpY95OH70BznZvN9PZ9oSd4GgeJbHMnTCMSBv9qEH4=; b=H5RjtVjUWYzPKBkx7pxS2vjO92SAYLknrI6zf25/rVPOzCHt4xfgSIGfTiAoo+0huw CcZkUp9r24/1D/lyEayl0YZx1oqdRa1ksH8Mb4aKoHELSiFbao4O4n48cUOvwirr5IMu 4YUPl7ifDZ3ZPYCgcAK2m5Za+8Zlc7Lh8k/Y4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=LqrdysSPOq3bAl8J123QzsYb0GikB74csT6AWTBtRqjA/Jv+UheMXHwUw5fLFWSe7c Mor2R4PoMStNZa4Qh4jlUmnnFwO4Km9ggZPhKkIyk5tUQ+vlGmHcLKQPdEka/ou4AI2a 6WwUMSHLZ0tEe8auDuKfB33LZ5l+/nuzWXANY= MIME-Version: 1.0 Received: by 10.223.70.141 with SMTP id d13mr9465667faj.111.1296966389066; Sat, 05 Feb 2011 20:26:29 -0800 (PST) Sender: rajkumar.w93@gmail.com Received: by 10.223.87.72 with HTTP; Sat, 5 Feb 2011 20:26:28 -0800 (PST) In-Reply-To: References: Date: Sun, 6 Feb 2011 09:56:28 +0530 X-Google-Sender-Auth: YBNYCa2mbbPGskfcKWWd90SujGo Message-ID: Subject: Re: Merging the rows of two column families(with similar attributes) into one ?? From: Ertio Lew To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thanks Tyler! I think I'll have to very carefully take into consideration all these factors before deciding upon how to split my data into CFs, as this cannot an objective answer. I am expecting around atleast 8 column families for my entire application, if I split the data strictly according to the various features and requirements of the application. I think there should have been provision for specifying on per query basis, what rows be cached while you're reading them, from a row_cache enabled CF. Thus you could easily merge similar data for different features of your application in a single CF. I believe, this would have also lead to much more efficient use of the cache space!!( if you were using same data for different parts in your app which have different caching needs) Regards, Ertio On Sun, Feb 6, 2011 at 1:22 AM, Tyler Hobbs wrote: >> if you have under control parameters like >> memtable_throughput & memtable_operations which are set per column >> family basis then you can directly control & adjust by splitting the >> memory space between two CFs in proportion to what you would do in >> single CF. >> Hence there should be no extra memory consumption for multiple CFs >> that have been split from single one?? > > Yes, I think you have the right idea here.=A0 This is a small amount of > overhead for the extra memtable and keeping track of a second set of > indexes, bloom filters, sstables, etc. > >> Regarding the compactions, I think even if they are more the size of >> the SST files to be compacted is smaller as the data has been split >> into two. >> Then more compactions but smaller too!! > > Yes. > >> if some CF is written less often as compared to other CFs, then the >> memtable would consume space in the memory until it is flushed, this >> memory space could have been much better used by a CF that's heavily >> written and read. And if you try to make the thresholds for flush >> smaller then more compactions would be needed. > > If you merge the two CFs together, then updates to the 'less freqent' row= s > will still consume memory, only it will all be within one memtable. > (Memtables grow in size until they are flushed, they don't reserve some s= et > amount of memory.)=A0 Furthermore, because your memtables will be filled = up by > the 'more frequent' rows, the 'less frequent' rows will get fewer > updates/overwrites in memory, so they will tend to be spread across a > greater number of SSTables. > > -- > Tyler Hobbs > Software Engineer, DataStax > Maintainer of the pycassa Cassandra Python client library > >