Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 20324 invoked from network); 5 Apr 2011 07:29:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Apr 2011 07:29:49 -0000 Received: (qmail 33523 invoked by uid 500); 5 Apr 2011 07:29:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33498 invoked by uid 500); 5 Apr 2011 07:29:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33490 invoked by uid 99); 5 Apr 2011 07:29:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2011 07:29:44 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Apr 2011 07:29:36 +0000 Received: by iye19 with SMTP id 19so129050iye.31 for ; Tue, 05 Apr 2011 00:29:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.43.52.195 with SMTP id vn3mr12712131icb.272.1301988555075; Tue, 05 Apr 2011 00:29:15 -0700 (PDT) Sender: scode@scode.org Received: by 10.231.208.65 with HTTP; Tue, 5 Apr 2011 00:29:14 -0700 (PDT) X-Originating-IP: [90.233.79.15] In-Reply-To: References: <34142DA8-23A8-435D-BF0B-1BFFDB6261CB@thelastpickle.com> Date: Tue, 5 Apr 2011 09:29:14 +0200 X-Google-Sender-Auth: WGGgTKh03fvkQ1L4v1hF24PcphI Message-ID: Subject: Re: Abnormal memory consumption From: Peter Schuller To: user@cassandra.apache.org Cc: openvictor Open Content-Type: text/plain; charset=UTF-8 > Okay, I see. But isn't there a big issue for scaling here ? > Imagine that I am the developper of a certain very successful website : At > year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF > because I added functionalities to my wonderful webiste will I need 20 Gb of > RAM ? And if at year three I had 300 Column families, will I need 120 Gb of > ram / node ? Or did I miss something about memory consuption ? It's up to you to size the memtable thresholds appropriately. The primary driver for memtable threshold size is the desire to avoid future compaction work by making the flushed memtables larger. As such, a larger memtable threshold is typically only particularly relevant for column families that see a lot of writes. So, if you have 50 column families out of which 2 are very frequently written and the remainder only rarely, there will probably not be any great motivation to have any significant memtable thresholds for the remainder. If you truly have a lot of column families, all of whom receive an equal amount of traffic, then to some extent it's a scaling issue in the sense that you'd be forced to use lower memtable thresholds for each column family than you would otherwise, and the result of that is additional compaction work (meaning, less sustainable write throughput). But you won't be forced to have 120 gig nodes (a 120 gig heap would be problematic for other reasons anyway). -- / Peter Schuller