Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6EE14C83A for ; Sun, 10 Jun 2012 22:50:31 +0000 (UTC) Received: (qmail 57598 invoked by uid 500); 10 Jun 2012 22:50:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57563 invoked by uid 500); 10 Jun 2012 22:50:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 57554 invoked by uid 99); 10 Jun 2012 22:50:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jun 2012 22:50:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.44] (HELO mail-wg0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jun 2012 22:50:22 +0000 Received: by wgbdr13 with SMTP id dr13so2009738wgb.25 for ; Sun, 10 Jun 2012 15:50:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:x-gm-message-state; bh=gCnk7A9fji9nuDwewvAVkBlBHWp1Zfr6F1AAgKnaHsI=; b=OvCLkGhk/aZOz93/1dF2UauDqQJJ7CKZz5Vkh/Lh0OOW3FiZ1k78ON9NCYhQxVeTi9 PZb2Rs6pBfbQaAPeXLaKv8jCF3kGqihdHYvzpPqUQUJBUNuEPG5NghDaSte7qmJxPbdP 6moPwNlLlww2zWBgEOiJZcC+ApeVFjyu6C8J9IOEW2ODWDTSREz30YPWDup9b+fL4uUB LkhZgnfFWfJB0YUEBiudyiq9ocSogB22rkANb2wcWLvDviOrvHmGk9cUZmks2ayJpP0E vIWxPdFounqCb4u9MpcgddsPT/t7RJX1Th2P3SW3Km3oJSozKV09rZOK91kCaBynJaCz tJyw== Received: by 10.180.94.4 with SMTP id cy4mr15963589wib.2.1339368599780; Sun, 10 Jun 2012 15:49:59 -0700 (PDT) Received: from Rustams-MacBook-Air.local (027f3497.bb.sky.com. [2.127.52.151]) by mx.google.com with ESMTPS id k8sm31915263wia.6.2012.06.10.15.49.58 (version=SSLv3 cipher=OTHER); Sun, 10 Jun 2012 15:49:58 -0700 (PDT) Message-ID: <4FD52495.3090206@code.az> Date: Sun, 10 Jun 2012 23:49:57 +0100 From: Rustam Aliyev User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120420 Thunderbird/12.0 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Can't delete from SCF wide row References: <4FC91D9C.6030304@code.az> In-Reply-To: Content-Type: multipart/alternative; boundary="------------000602000904040900060201" X-Gm-Message-State: ALoCoQlp/xw2/xoTT8omJ248K0fu2YjkHeKdAhO2laSEpOhXFyjBvQsf2qzsy/uSM/Ogwh9piQI2 This is a multi-part message in MIME format. --------------000602000904040900060201 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi Aaron, Thanks for reply. I did some more tests and it looks like the problem is not in deletes/writes, it rather in reads (I do read before deleting). It turns out that problem was in another CF which had wide row of 1.2GB and row cache. Cassandra tries to read this row into cache and becomes unresponsive. Disabling row cache on this CF helped to read through this row and perform cleanup. It seems that Cassandra reads into cache all columns, even those which were deleted (w/ tombstones) but not GCed. Seems that CASSANDRA-2864 and CASSANDRA-1956 opened to address this problem. Best, Rustam. On 04/06/2012 19:41, aaron morton wrote: > Delete is a no look write operation, like normal writes. So it should not be directly causing a lot of memory allocation. > > It may be causing a lot of compaction activity, which due to the wide row may be throwing up lots of GC. > > Try the following to get through the deletions: > > * disable compaction by setting min_compaction_level and max_compaction_level to 0 (via nodetool on current versions) > > Once you have finished compaction > * lower the in_memory_compaction_limit in the yaml. > * set concurrent_compactions to 2 in the yaml > * enable compaction again > > Once everything has settled down restore the in_memory_compaction_limit and concurrent_compactions > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 2/06/2012, at 7:53 AM, Rustam Aliyev wrote: > >> Hi all, >> >> I have SCF with ~250K rows. One of these rows is relatively large - it's a wide row (according to compaction logs) containing ~100.000 super columns and overall size of 1GB. Each super column has average size of 10K and ~10 sub columns. >> >> When I'm trying to delete ~90% of the columns in this particular row, Cassandra nodes which own this wide row (3 of 5, RF=3) quickly run out of the heap space. See logs from one of the hosts here: >> >> http://pastebin.com/raw.php?i=kwn7b3rP >> >> After that, all 3 nodes start flapping up/down and GC messages (like the one in the bottom of the pastebin above) appearing in the logs. Cassandra never repairs from this mode and the only way out if to "kill -9" and start again. On IRC it was suggested that it enters GC death spiral. >> >> I tried to throttle delete requests on the client side - sending batch of 100 delete requests each 500ms. So no more than 200 deletes/sec. But it didn't help. I can reduce it further to 100/sec, but I don't think it will help much. >> >> I delete millions of columns from other row in this SCF at the same rate and never have hit this problem. It only happens when I try to delete from this particular wide row. >> >> So right now I don't know how can I delete these columns. Any ideas? >> >> >> Many thanks, >> Rustam. --------------000602000904040900060201 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Hi Aaron,

Thanks for reply. I did some more tests and it looks like the problem is not in deletes/writes, it rather in reads (I do read before deleting).

It turns out that problem was in another CF which had wide row of 1.2GB and row cache. Cassandra tries to read this row into cache and becomes unresponsive. Disabling row cache on this CF helped to read through this row and perform cleanup. It seems that Cassandra reads into cache all columns, even those which were deleted (w/ tombstones) but not GCed.

Seems that CASSANDRA-2864 and CASSANDRA-1956 opened to address this problem.

Best,
Rustam.


On 04/06/2012 19:41, aaron morton wrote:
Delete is a no look write operation, like normal writes. So it should not be directly causing a lot of memory allocation. 

It may be causing a lot of compaction activity, which due to the wide row may be throwing up lots of GC. 

Try the following to get through the deletions:

* disable compaction by setting min_compaction_level and max_compaction_level to 0 (via nodetool on current versions)

Once you have finished compaction
* lower the in_memory_compaction_limit in the yaml. 
* set concurrent_compactions to 2 in the yaml
* enable compaction again

Once everything has settled down restore the in_memory_compaction_limit and concurrent_compactions

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 7:53 AM, Rustam Aliyev wrote:

Hi all,

I have SCF with ~250K rows. One of these rows is relatively large - it's a wide row (according to compaction logs) containing ~100.000 super columns and overall size of 1GB. Each super column has average size of 10K and ~10 sub columns.

When I'm trying to delete ~90% of the columns in this particular row, Cassandra nodes which own this wide row (3 of 5, RF=3) quickly run out of the heap space. See logs from one of the hosts here:

http://pastebin.com/raw.php?i=kwn7b3rP

After that, all 3 nodes start flapping up/down and GC messages (like the one in the bottom of the pastebin above) appearing in the logs. Cassandra never repairs from this mode and the only way out if to "kill -9" and start again. On IRC it was suggested that it enters GC death spiral.

I tried to throttle delete requests on the client side - sending batch of 100 delete requests each 500ms. So no more than 200 deletes/sec. But it didn't help. I can reduce it further to 100/sec, but I don't think it will help much.

I delete millions of columns from other row in this SCF at the same rate and never have hit this problem. It only happens when I try to delete from this particular wide row.

So right now I don't know how can I delete these columns. Any ideas?


Many thanks,
Rustam.

    

--------------000602000904040900060201--