Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D45DED5AE for ; Tue, 21 Aug 2012 09:47:30 +0000 (UTC) Received: (qmail 20454 invoked by uid 500); 21 Aug 2012 09:47:28 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20432 invoked by uid 500); 21 Aug 2012 09:47:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20411 invoked by uid 99); 21 Aug 2012 09:47:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2012 09:47:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a52.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2012 09:47:21 +0000 Received: from homiemail-a52.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTP id 5D7216B80ED for ; Tue, 21 Aug 2012 02:46:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=r8L/gQJFehCOg+sQLrJ527jdpJ Y=; b=yNvUK17Q1+rJsU42CSo1Dll2dIYR7x4UrfenocMo7CFvg6QQpXudc2kyrt FGr9+drYmI/SQYyv9DK2CA3EWckF2ZSQh+u6fnXSchi7DDHuhlQAPUFE9xoxSMOB Bvw7BrTbnqtFzVGNjXLxrIaJpDyTSN4ksWu/DdJ9sCOZHIT30= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTPSA id CEDAF6B8057 for ; Tue, 21 Aug 2012 02:46:58 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_0B2A3DC7-B8A9-4DAE-B4DA-D7CFE32D71E2" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1485\)) Subject: Re: get_slice on wide rows Date: Tue, 21 Aug 2012 21:46:56 +1200 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1485) --Apple-Mail=_0B2A3DC7-B8A9-4DAE-B4DA-D7CFE32D71E2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > Is the problem that cassandra is attempting to load all the deleted = columns into memory?=20 Yup.=20 The talk by Mat Dennis at the Cassandra Summit may be of interest to = you. He talks about similar things = http://www.datastax.com/events/cassandrasummit2012/presentations Drop the gc_grace_seconds to 1 so that tombstones can be purged faster. = A column level tombstone still has to hit disk so that it will overwrite = any existing column on disk. (so setting gc_grace_seconds to 0 has = pretty much the same effect).=20 You may also want to try levelled DB on the CF = http://www.datastax.com/dev/blog/when-to-use-leveled-compaction > Is the solution here partitioning the wide row into multiple narrower = rows? That's also sensible. I would give the approach above a try first, may = give you more bang for your buck.=20 =20 Cheers =20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/08/2012, at 4:49 AM, feedly team wrote: > I have a column family that I am using for consistency purposes. = Basically a marker column is written to a row in this family before some = actions take place and is deleted only after all the actions complete. = The idea is that if something goes horribly wrong this table can be read = to see what needs to be fixed.=20 >=20 > In my dev environment things worked as planned, but in a larger = scale/high traffic environment, the slice query times out and then = cassandra quickly runs out of memory. The main difference here is that = there is a very large number of writes (and deleted columns) in the row = my code is attempting to read. Is the problem that cassandra is = attempting to load all the deleted columns into memory? I did an = sstableToJson dump and saw that the "d" deletion marker seemed to be = present for the columns, though i didn't write any code to check all = values. Is the solution here partitioning the wide row into multiple = narrower rows? --Apple-Mail=_0B2A3DC7-B8A9-4DAE-B4DA-D7CFE32D71E2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 Is the problem that cassandra is attempting = to load all the deleted columns into = memory? Yup. 

The talk by Mat = Dennis at the Cassandra Summit may be of interest to you. He talks about = similar things = http://www.datastax.com/events/cassandrasummit2012/presentations
=

Drop the gc_grace_seconds to 1 so that tombstones = can be purged faster. A column level tombstone still has to hit disk so = that it will overwrite any existing column on disk. (so setting = gc_grace_seconds to 0 has pretty much the same = effect). 

You may also want to try = levelled DB on the CF h= ttp://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Is the solution here = partitioning the wide row into multiple narrower = rows?
That's also sensible. I would give the approach above = a try first, may give you more bang for your = buck. 
 
Cheers
 
http://www.thelastpickle.com

On 21/08/2012, at 4:49 AM, feedly team <feedlydev@gmail.com> = wrote:

I have a column family that I am using for consistency = purposes. Basically a marker column is written to a row in this family = before some actions take place and is deleted only after all the actions = complete. The idea is that if something goes horribly wrong this table = can be read to see what needs to be fixed. 

In my dev environment things worked as planned, but in a = larger scale/high traffic environment, the slice query times out and = then cassandra quickly runs out of memory. The main difference here is = that there is a very large number of writes (and deleted columns) in the = row my code is attempting to read. Is the problem that cassandra is = attempting to load all the deleted columns into memory? I did an = sstableToJson dump and saw that the "d" deletion marker seemed to be = present for the columns, though i didn't write any code to check all = values. Is the solution here partitioning the wide row into multiple = narrower rows?

= --Apple-Mail=_0B2A3DC7-B8A9-4DAE-B4DA-D7CFE32D71E2--