Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 39634 invoked from network); 20 Oct 2010 20:18:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Oct 2010 20:18:13 -0000 Received: (qmail 28704 invoked by uid 500); 20 Oct 2010 20:18:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28629 invoked by uid 500); 20 Oct 2010 20:18:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28621 invoked by uid 99); 20 Oct 2010 20:18:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 20:18:11 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cassuser@gmail.com designates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Oct 2010 20:18:04 +0000 Received: by ywa6 with SMTP id 6so2534050ywa.31 for ; Wed, 20 Oct 2010 13:17:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=edBrezJQT5pl98Uygo7ngS6NA60sxxck5pEUqr4UVwA=; b=WmmoNQ2cUwqpZKU51AbuqYhAGTLm8OA51D7Wa75Ty1qzEPaGBzP+hXgaf9q5FXtAgw xL84OGPgCFS8K1QpHITrDPEBMQ2nVPM+JkOXgIiOZ/n7/yc3zFUWQdmVTdQUKRa3pdw/ qRAhD35sukyv7cB3/+RbzX1GOVstmksQqeIAk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=QwbmEOOwkBcnZGe8oLzGY4hnCJViN4pJnlgdAvnDWFa5ovFEU+2lhGd4cF7n7hiMtC n4sjomiSR9ajJ5sc2dCK3ANVLEVkf0lxPq+I9GCJbcjCeoOxwvCkClyNWU2842VTvr84 eP+PC80L2CogzQFke2lYZHMxIqqzABViEvqwY= MIME-Version: 1.0 Received: by 10.42.142.5 with SMTP id q5mr5706887icu.302.1287605863609; Wed, 20 Oct 2010 13:17:43 -0700 (PDT) Received: by 10.220.189.8 with HTTP; Wed, 20 Oct 2010 13:17:43 -0700 (PDT) In-Reply-To: <0d15d7d7-80dc-e63f-cf47-bfcca970ffd0@me.com> References: <0d15d7d7-80dc-e63f-cf47-bfcca970ffd0@me.com> Date: Wed, 20 Oct 2010 13:17:43 -0700 Message-ID: Subject: Re: memtable sstable questions (0.6.4) From: CassUser CassUser To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e8eb61c74620493121aae X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e8eb61c74620493121aae Content-Type: text/plain; charset=ISO-8859-1 Cool thanks, that helps. So even if we have defined a column family in the storage-conf and it's empty, this has some overhead in cassandra and the following rule should apply: memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal caches. On Wed, Oct 20, 2010 at 12:53 PM, Aaron Morton wrote: > Take a look at the section on JVM Heap size here > http://wiki.apache.org/cassandra/MemtableThresholds > > CF's have a large > overhead, Keyspaces have none/little. > > In general write performance will be affected by the memtable thresholds > (also on the link above). Read performance will be affected by the size of > the cassandra caches and OS file caches. Compaction can slow a node, 0.7 > handles this better via the dynamic snitch. > > Start with conservative / default values, then crank things up. > > Aaron > > On 21 Oct, 2010,at 08:42 AM, CassUser CassUser wrote: > > Thanks for the link. > > #2 was not meant to be trick question, it just came out like that :). what > i was after is the overhead associated with large number of keyspaces and > column families (i didn't mean empty memtables :). If a few keyspaces that > have 20 or so column families with a percentage of rows cached. Does this > effect write performance to other keyspaces in the cluster? > > > > On Wed, Oct 20, 2010 at 12:01 PM, Edward Capriolo wrote: > >> >> On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser >> wrote: >> > Hey, >> > >> > As I understand it writes go directly to the commit log. Once a >> threshold >> > has been reached the data is shipped to a memtable, and again to an >> sstable. >> > >> > 1. How many memtables are created when a flush happens from a commit >> log? >> > One per CF? >> > >> > 2. Is there any space associated with an empty memtable? >> > >> > 3. When a flush happens from a memtable to an sstable, does this create >> a >> > single new sstable? >> > >> > 4. Should compaction be turned off during a large data load? >> > >> > Thanks. >> > >> >> Take a look at: >> >> >> http://wiki.apache.org/cassandra/MemtableSSTable >> >> 1 and 3 >> Memtables flush for three reasons size, time, and number of >> operations. There is one memtable per column family. Each memtable >> flushes individually. >> >> 2. Is this a trick question? >> >> 4. Should compaction be turned off during a large data load? >> You can disable compaction during bulk loads. This can help because >> otherwise the same data might be compacted multiple times. However if >> you go to long with compaction turned off you end up with multiple >> sstables. This can end up in fragmented rows. >> > > --90e6ba6e8eb61c74620493121aae Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cool thanks, that helps.

So even if we have defined a column family = in the storage-conf and it's empty, this has some overhead in cassandra= and the following rule should apply:

memtable_throughput_in_mb= =A0*=A03=A0*=A0number=A0of=A0hot=A0CFs=A0+=A01G=A0+=A0internal=A0caches.



On Wed, Oct 20, 2010 at 12:53 PM, Aa= ron Morton <aaron@thelastpickle.com> wrote:
Take a look at the section on JVM Heap size here=A0http://= wiki.apache.org/cassandra/MemtableThresholds

<= a href=3D"http://wiki.apache.org/cassandra/MemtableThresholds" target=3D"_b= lank">CF's have a large overhead, Keyspaces have none/little.=A0
In general write performance will be affected by the memtabl= e thresholds (also on the link above). Read performance will be affected by= the size =A0of the cassandra caches and OS file caches. Compaction can slo= w a node, 0.7 handles this better via the dynamic snitch.

Start with conservative / default values, then crank th= ings up.=A0

Aaron
=A0
On 21 Oct, 2010,at 08:42 = AM, CassUser CassUser <cassuser@gmail.com> wrote:

Thanks for the link.=A0
<= br>#2 was not meant to be trick question, it just came out like that :).=A0= what i was after is the overhead associated with large number of keyspaces= and column families (i didn't mean empty memtables :).=A0 If a few key= spaces that have 20 or so column families with a percentage of rows cached.= =A0 Does this effect write performance to other keyspaces in the cluster?= =A0



On Wed, Oct 20, 2010 at 12:01 PM, Ed= ward Capriolo <edlinuxguru@gmail.com> wrote:

On Wed, Oct 20, 2010 at 2:47 PM, CassUser CassUser= <cassuser@gmail= .com> wrote:
> Hey,
>
> As I understand it writes go directly to the commit log.=A0 Once a thr= eshold
> has been reached the data is shipped to a memtable, and again to an ss= table.
>
> 1. How many memtables are created when a flush happens from a commit l= og?
> One per CF?
>
> 2. Is there any space associated with an empty memtable?
>
> 3. When a flush happens from a memtable to an sstable, does this creat= e a
> single new sstable?
>
> 4. Should compaction be turned off during a large data load?
>
> Thanks.
>

Take a look at:


http://wiki.apache.org/cassandra/MemtableSSTable

1 and 3
Memtables flush for three reasons size, time, and number of
operations. There is one memtable per column family. Each memtable
flushes individually.

2. Is this a trick question?

4. Should compaction be turned off during a large data load?
You can disable compaction during bulk loads. This can help because otherwise the same data might be compacted multiple times. However if
you go to long with compaction turned off you end up with multiple
sstables. This can end up in fragmented rows.


--90e6ba6e8eb61c74620493121aae--