Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 287091098D for ; Thu, 27 Jun 2013 04:52:13 +0000 (UTC) Received: (qmail 9701 invoked by uid 500); 27 Jun 2013 04:52:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 9473 invoked by uid 500); 27 Jun 2013 04:52:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 9460 invoked by uid 99); 27 Jun 2013 04:52:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2013 04:52:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a80.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jun 2013 04:52:05 +0000 Received: from homiemail-a80.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTP id 5652837A06F for ; Wed, 26 Jun 2013 21:51:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=pbLaQZIczAWjVPSII8t/4ma025 o=; b=OzvoUjPuTU8krQjq6W61CQ9XmfQMvF56uv+ehqAgQKOlLs7+ibNxwlKzar qr++MpqauaoNfoBgVeQRDEiKGCUFrQSQMMd21Jc7UPgalOzT8P4cezzW6CRc8rAV /NhdqK4VxX2+wBIrlpdcOrN/vRvLk9luA1mFwSRDDmNxGFnsg= Received: from [172.16.1.7] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a80.g.dreamhost.com (Postfix) with ESMTPSA id 902C837A06B for ; Wed, 26 Jun 2013 21:51:44 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_5C65A0E5-7E4E-4487-932B-3122D2076851" Message-Id: <0DB802F2-9874-4D52-B2DF-4F9817F19141@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Cassandra as storage for cache data Date: Thu, 27 Jun 2013 16:51:40 +1200 References: <51C98DA0.1020006@gridnine.com> <841A5FC1-FBCB-4664-8825-CE497D9FED4F@gmail.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_5C65A0E5-7E4E-4487-932B-3122D2076851 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 I'll also add that you are probably running into some memory issues, 2.5 = GB is a low heap size=20 > -Xms2500M -Xmx2500M -Xmn400M If you really do have a cache and want to reduce the disk activity = disable durable_writes on the KS. That will stop the writes from going = to the commit log which is one reason memtables are flushed to disk. The = other reason is because the memory usage approaches the = memtable_total_space_in_mb setting. Modern (1.2) releases are very good = at managing the memory provided the jamm meter is working. With this = approach and the other tips below you should be able to get better = performance.=20 WARNING: disabling durable_writes means that writes are only in memory = and will not be committed to disk until the CF's are flushed. You should = *always* use nodetool drain before shutting down a node in this case.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 26/06/2013, at 8:52 AM, sankalp kohli wrote: > Apart from what Jeremy said, you can try these > 1) Use replication =3D 1. It is cache data and you dont need = persistence.=20 > 2) Try playing with memtable size. > 3) Use netflix client library as it will reduce one hop. It will chose = the node with data as the co ordinator.=20 > 4) Work on your schema. You might want to have fewer columns in each = row. With fatter rows, bloom filter will give out more sstables which = are eligible.=20 >=20 > -Sankalp >=20 >=20 > On Tue, Jun 25, 2013 at 9:04 AM, Jeremy Hanna = wrote: > If you have rapidly expiring data, then tombstones are probably = filling your disk and your heap (depending on how you order the data on = disk). To check to see if your queries are affected by tombstones, you = might try using the query tracing that's built-in to 1.2. > See: > = http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-= like-datasets -- has an example of tracing where you can see tombstones = affecting the query > http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 >=20 > You'll want to consider reducing the gc_grace period from the default = of 10 days for those column families - with the understanding why = gc_grace exists in the first place, see = http://wiki.apache.org/cassandra/DistributedDeletes . Then once the = gc_grace period has passed, the tombstones will stay around until they = are compacted away. So there are two options currently to compact them = away more quickly: > 1) use leveled compaction - see = http://www.datastax.com/dev/blog/when-to-use-leveled-compaction Leveled = compaction only requires 10% headroom (as opposed to 50% for size tiered = compaction) for amount of disk that needs to be kept free. > 2) if 1 doesn't work and you're still seeing performance degrading and = the tombstones aren't getting cleared out fast enough, you might = consider using size tiered compaction but performing regular major = compactions to get rid of expired data. >=20 > Keep in mind though that if you use gc_grace of 0 and do any kind of = manual deletes outside of TTLs, you probably want to do the deletes at = ConsistencyLevel.ALL or else if a node goes down, then comes back up, = there's a chance that deleted data may be resurrected. That only = applies to non-ttl data where you manually delete it. See the = explanation of distributed deletes for more information. >=20 >=20 > On 25 Jun 2013, at 13:31, Dmitry Olshansky = wrote: >=20 > > Hello, > > > > we are using Cassandra as a data storage for our caching system. Our = application generates about 20 put and get requests per second. An = average size of one cache item is about 500 Kb. > > > > Cache items are placed into one column family with TTL set to 20 - = 60 minutes. Keys and values are bytes (not utf8 strings). Compaction = strategy is SizeTieredCompactionStrategy. > > > > We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is = 2. Each node has 10GB of RAM and enough space on HDD. > > > > Now when we're putting this cluster into the load it's quickly fills = with our runtime data (about 5 GB on every node) and we start observing = performance degradation with often timeouts on client side. > > > > We see that on each node compaction starts very frequently and lasts = for several minutes to complete. It seems that each node usually busy = with compaction process. > > > > Here the questions: > > > > What are the recommended setup configuration for our use case? > > > > Is it makes sense to somehow tell Cassandra to keep all data in = memory (memtables) to eliminate flushing it to disk (sstables) thus = decreasing number of compactions? How to achieve this behavior? > > > > Cassandra is starting with default shell script that gives the = following command line: > > > > jsvc.exec -user cassandra -home = /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile = /var/run/cassandra.pid -errfile &1 -outfile = /var/log/cassandra/output.log -cp = -Dlog4j.configuration=3Dlog4j-server.properties = -Dlog4j.defaultInitOverride=3Dtrue = -XX:HeapDumpPath=3D/var/lib/cassandra/java_1371805844.hprof = -XX:ErrorFile=3D/var/lib/cassandra/hs_err_1371805844.log -ea = -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar = -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms2500M = -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k = -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled = -XX:SurvivorRatio=3D8 -XX:MaxTenuringThreshold=3D1 = -XX:CMSInitiatingOccupancyFraction=3D75 = -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB = -Djava.net.preferIPv4Stack=3Dtrue = -Dcom.sun.management.jmxremote.port=3D7199 = -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = org.apache.cassandra.service.CassandraDaemon > > > > -- > > Best regards, > > Dmitry Olshansky > > >=20 >=20 --Apple-Mail=_5C65A0E5-7E4E-4487-932B-3122D2076851 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 I'll = also add that you are probably running into some memory issues, 2.5 GB = is a low heap size 

-Xms2500M -Xmx2500M = -Xmn400M

<= div>If you really do have a cache and want to reduce the disk activity = disable durable_writes on the KS. That will stop the writes from going = to the commit log which is one reason memtables are flushed to disk. The = other reason is because the memory usage approaches = the memtable_total_space_in_mb setting. Modern (1.2) releases are = very good at managing the memory provided the jamm meter is working. = With this approach and the other tips below you should be able to get = better performance. 

WARNING: disabling = durable_writes means that writes are only in memory and will not be = committed to disk until the CF's are flushed. You should *always* use = nodetool drain before shutting down a node in this = case. 

Cheers

http://www.thelastpickle.com

On 26/06/2013, at 8:52 AM, sankalp kohli <kohlisankalp@gmail.com> = wrote:

Apart from what Jeremy said, you can try = these
1) Use replication =3D 1. It is cache data and you = dont need persistence. 
2) Try playing with = memtable size.
3) Use netflix client library as it = will reduce one hop. It will chose the node with data as the co = ordinator. 
4) Work on your schema. You might want to have fewer = columns in each row. With fatter rows, bloom filter will give out more = sstables which are eligible. 

-Sankalp


On Tue, = Jun 25, 2013 at 9:04 AM, Jeremy Hanna <jeremy.hanna1234@gmail.com> wrote:
If you have rapidly expiring data, then tombstones are probably = filling your disk and your heap (depending on how you order the data on = disk).  To check to see if your queries are affected by tombstones, = you might try using the query tracing that's built-in to 1.2.
See:
http://www.datastax.com/dev/blog/cassandra-anti-patterns= -queues-and-queue-like-datasets  -- has an example of tracing = where you can see tombstones affecting the query
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-= 2

You'll want to consider reducing the gc_grace period from the default of = 10 days for those column families - with the understanding why gc_grace = exists in the first place, see http://wiki.apache.org/cassandra/DistributedDeletes = .  Then once the gc_grace period has passed, the tombstones will = stay around until they are compacted away.  So there are two = options currently to compact them away more quickly:
1) use leveled compaction - see http://www.datastax.com/dev/blog/when-to-use-leveled-com= paction  Leveled compaction only requires 10% headroom (as = opposed to 50% for size tiered compaction) for amount of disk that needs = to be kept free.
2) if 1 doesn't work and you're still seeing performance degrading and = the tombstones aren't getting cleared out fast enough, you might = consider using size tiered compaction but performing regular major = compactions to get rid of expired data.

Keep in mind though that if you use gc_grace of 0 and do any kind of = manual deletes outside of TTLs, you probably want to do the deletes at = ConsistencyLevel.ALL or else if a node goes down, then comes back up, = there's a chance that deleted data may be resurrected.  That only = applies to non-ttl data where you manually delete it.  See the = explanation of distributed deletes for more information.


On 25 Jun 2013, at 13:31, Dmitry Olshansky <dmitry.olshansky@gridnine.co= m> wrote:

> Hello,
>
> we are using Cassandra as a data storage for our caching system. = Our application generates about 20 put and get requests per second. An = average size of one cache item is about 500 Kb.
>
> Cache items are placed into one column family with TTL set to 20 - = 60 minutes. Keys and values are bytes (not utf8 strings). Compaction = strategy is SizeTieredCompactionStrategy.
>
> We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is = 2. Each node has 10GB of RAM and enough space on HDD.
>
> Now when we're putting this cluster into the load it's quickly = fills with our runtime data (about 5 GB on every node) and we start = observing performance degradation with often timeouts on client = side.
>
> We see that on each node compaction starts very frequently and = lasts for several minutes to complete. It seems that each node usually = busy with compaction process.
>
> Here the questions:
>
> What are the recommended setup configuration for our use case?
>
> Is it makes sense to somehow tell Cassandra to keep all data in = memory (memtables) to eliminate flushing it to disk (sstables) thus = decreasing number of compactions? How to achieve this behavior?
>
> Cassandra is starting with default shell script that gives the = following command line:
>
> jsvc.exec -user cassandra -home = /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile = /var/run/cassandra.pid -errfile &1 -outfile = /var/log/cassandra/output.log -cp <CLASSPATH_SKIPPED> = -Dlog4j.configuration=3Dlog4j-server.properties = -Dlog4j.defaultInitOverride=3Dtrue = -XX:HeapDumpPath=3D/var/lib/cassandra/java_1371805844.hprof = -XX:ErrorFile=3D/var/lib/cassandra/hs_err_1371805844.log -ea = -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar = -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms2500M = -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k = -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled = -XX:SurvivorRatio=3D8 -XX:MaxTenuringThreshold=3D1 = -XX:CMSInitiatingOccupancyFraction=3D75 = -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB = -Djava.net.preferIPv4Stack=3Dtrue = -Dcom.sun.management.jmxremote.port=3D7199 = -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = org.apache.cassandra.service.CassandraDaemon
>
> --
> Best regards,
> Dmitry Olshansky
>



= --Apple-Mail=_5C65A0E5-7E4E-4487-932B-3122D2076851--