From user-return-31033-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Thu Jan 10 20:01:34 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 41550ED6E for ; Thu, 10 Jan 2013 20:01:34 +0000 (UTC) Received: (qmail 4615 invoked by uid 500); 10 Jan 2013 20:01:31 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4570 invoked by uid 500); 10 Jan 2013 20:01:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4558 invoked by uid 99); 10 Jan 2013 20:01:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jan 2013 20:01:31 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a79.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jan 2013 20:01:26 +0000 Received: from homiemail-a79.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a79.g.dreamhost.com (Postfix) with ESMTP id 69A907D4070 for ; Thu, 10 Jan 2013 12:00:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=NsPUKoQyV7FW4wJ0ENvGHmhv3C k=; b=2Eh9HVHInnsHDJSGNr02HutUDJktqvd310hRDpFN9uT4sadColsloc3XZQ E74lDgqFLwFDqKMlxQnHQztS0nvaFj2G6Y9lWC5215ZthrtdDIhDaYVe8swL04Bm jkJ9X9xv2+qb4IjIFaycLehXvl4IyySWiadVa3UH/ONYK+ikY= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a79.g.dreamhost.com (Postfix) with ESMTPSA id B6B057D406F for ; Thu, 10 Jan 2013 12:00:51 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_944AAB1C-AE7A-44D2-81D0-D7FBD5456C01" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Collecting of tombstones columns during read query fills up heap Date: Fri, 11 Jan 2013 09:01:03 +1300 References: <7C46971B-05EF-4C16-A913-E848F4E2B6E2@co.sapo.pt> To: user@cassandra.apache.org In-Reply-To: <7C46971B-05EF-4C16-A913-E848F4E2B6E2@co.sapo.pt> X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_944AAB1C-AE7A-44D2-81D0-D7FBD5456C01 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > So, one column represents a file in that directory and it has no = value. Just so I understand, the file contents are *not* stored in the column = value ? > Basically the heap fills up and if several queries happens = simultaneously, the heap is exhausted and the node stops. Are you seeing the GCInspector log messages ? Are they ParNew or CMS = compactions? If you want to get more insight into what the JVM is doing enable the GC = logging options in cassandra-env.sh.=20 > Dumping the SSTables shows that there were a lot of tombstones between = those 2 columns. How many is a lot ? > Normally I run with a 8GB heap and have no problems, but problematic = queries can fill up the heap even if I bump it up to 24GB. The machines = have 32GB. For queries like this it's (usually) not the overall size of the JVM = heap, Xmx. It's the size of the NEW_HEAP (in cassandra-env.sh) which sets Xmn. And = the other new heap settings, SurvivorRatio and MaxTenuringThreshold. = What settings do you have for those ? > Of course, the problem goes away after gc_grace_seconds pass and I run = a manual compact on that CF, the tombstones are removed and queries to = that row are efficient again. If you have a CF that has a high number of overwrites or deletions using = Levelled Compaction can help. It does use up some more IO that sized = tiered but it's designed for these sorts of situations. See = http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra = and http://www.datastax.com/dev/blog/when-to-use-leveled-compaction Schema wise, you could try have multiple "directory" rows for each user. = At certain times you can create a new row, which then receives all the = writes. But you read (and delete if necessary) from all rows. Then = migrate the data from the old rows to the new one and remove the old = row. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/01/2013, at 12:37 AM, Andr=E9 Cruz wrote: > Hello. >=20 > I have a schema to represent a filesystem for my users. In this schema = one of the CF stores a directory listing this way: >=20 > CF DirList >=20 > Dir1: =20 > File1:NOVAL File2:NOVAL ... >=20 > So, one column represents a file in that directory and it has no = value. The file metadata is stored elsewhere. When listing the contents = of a directory I fetch the row contents in batches (using pycassa's = column_count and column_start) and always limit the number of columns = that I want returned, so as not to occupy too much memory on the = Cassandra server. However, if a certain user has deleted a lot of files = in that dir and so has a lot of tombstones, even fetching with a = column_count of 2 can pose problems to the Cassandra server. Basically = the heap fills up and if several queries happens simultaneously, the = heap is exhausted and the node stops. Dumping the SSTables shows that = there were a lot of tombstones between those 2 columns. >=20 > Is there anything, other than schema changes or throttling on the = application side, than I can do to prevent problems like these? = Basically I would like Cassandra to stop a query if the resultset = already has X items whether they are tombstones or not, and return an = error. Or maybe it can stop if the resultset already occupies more then = Y bytes or the heap is almost full. Some safety valve to prevent a DoS. >=20 > I should point out that I am using 1.1.5, but I have not seen anything = in the changelog that may reference this issue or more recent releases. = Normally I run with a 8GB heap and have no problems, but problematic = queries can fill up the heap even if I bump it up to 24GB. The machines = have 32GB. >=20 > Of course, the problem goes away after gc_grace_seconds pass and I run = a manual compact on that CF, the tombstones are removed and queries to = that row are efficient again. >=20 > Thanks, > Andr=E9 Cruz --Apple-Mail=_944AAB1C-AE7A-44D2-81D0-D7FBD5456C01 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1  So, one column represents a file in = that directory and it has no value.Just so I understand, = the file contents are *not* stored in the column value = ?

Basically the heap fills = up and if several queries happens simultaneously, the heap is exhausted = and the node stops.
Are you seeing the GCInspector log = messages ? Are they ParNew or CMS compactions?
If you want to get = more insight into what the JVM is doing enable the GC logging options in = cassandra-env.sh. 

Dumping = the SSTables shows that there were a lot of tombstones between those 2 = columns.
How many is a lot ?

 Normally I run with a 8GB heap and have no problems, = but problematic queries can fill up the heap even if I bump it up to = 24GB. The machines have 32GB.
For queries like this it's = (usually) not the overall size of the JVM heap, Xmx.
It's = the size of the NEW_HEAP (in cassandra-env.sh) which sets Xmn. And the = other new heap settings, SurvivorRatio = and MaxTenuringThreshold. What settings do you have for those = ?

Of course, the = problem goes away after gc_grace_seconds pass and I run a manual compact = on that CF, the tombstones are removed and queries to that row are = efficient again.
If you have a CF that has a high number of = overwrites or deletions using Levelled Compaction can help. It does use = up some more IO that sized tiered but it's designed for these sorts of = situations. See http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassa= ndra and h= ttp://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Schema wise, you could try have multiple "directory" = rows for each user. At certain times you can create a new row, which = then receives all the writes. But you read (and delete if necessary) = from all rows. Then migrate the data from the old rows to the new one = and remove the old = row.

Cheers


http://www.thelastpickle.com

On 11/01/2013, at 12:37 AM, Andr=E9 Cruz <andre.cruz@co.sapo.pt> = wrote:

Hello.

I have a schema to represent a filesystem = for my users. In this schema one of the CF stores a directory listing = this way:

CF DirList

  Dir1: =    
=         File1:NOVAL File2:NOVAL = ...

So, one column represents a file in that directory and it has = no value. The file metadata is stored elsewhere. When listing the = contents of a directory I fetch the row contents in batches (using = pycassa's column_count and column_start) and always limit the number of = columns that I want returned, so as not to occupy too much memory on the = Cassandra server. However, if a certain user has deleted a lot of files = in that dir and so has a lot of tombstones, even fetching with a = column_count of 2 can pose problems to the Cassandra server. Basically = the heap fills up and if several queries happens simultaneously, the = heap is exhausted and the node stops. Dumping the SSTables shows that = there were a lot of tombstones between those 2 columns.

Is there = anything, other than schema changes or throttling on the application = side, than I can do to prevent problems like these? Basically I would = like Cassandra to stop a query if the resultset already has X items = whether they are tombstones or not, and return an error. Or maybe it can = stop if the resultset already occupies more then Y bytes or the heap is = almost full. Some safety valve to prevent a DoS.

I should point = out that I am using 1.1.5, but I have not seen anything in the changelog = that may reference this issue or more recent releases. Normally I run = with a 8GB heap and have no problems, but problematic queries can fill = up the heap even if I bump it up to 24GB. The machines have = 32GB.

Of course, the problem goes away after gc_grace_seconds = pass and I run a manual compact on that CF, the tombstones are removed = and queries to that row are efficient again.

Thanks,
Andr=E9 = Cruz

= --Apple-Mail=_944AAB1C-AE7A-44D2-81D0-D7FBD5456C01--