Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7A4B59A23 for ; Sat, 2 Jun 2012 07:44:19 +0000 (UTC) Received: (qmail 7695 invoked by uid 500); 2 Jun 2012 07:44:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 7668 invoked by uid 500); 2 Jun 2012 07:44:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 7656 invoked by uid 99); 2 Jun 2012 07:44:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Jun 2012 07:44:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.172] (HELO mail-vc0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Jun 2012 07:44:10 +0000 Received: by vcqp1 with SMTP id p1so1918193vcq.31 for ; Sat, 02 Jun 2012 00:43:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=9KEEgJGvKdTGo5TpvVxk/rDK9XQY5mkS9GMy1H86LcM=; b=QZ9z5z26evKcxtqr4ngVP5qaHYQVypHmNUj/p+8Vs7zoFgQOZz0EAy+KTNDGZqYZCk ToRciiPQJEiYjY67Y+nia9xVAKEi/LikQaFaGOZPkhKw+xs77eezT4qoHyostBh7D/JP klp9xld33oYFhLF0TPFerLN4BK/QG1CC8iSvKgMtgtmry+zj5CSDTTSbD+QEmCML1d3J N3r2iI8auOs2qkiCpN9ubEIxwEtF8ulhlOeYBSAI0AjSh8Z2/oq+9uEoKGi54nQiEafY MAKA29Bhfg9nFvsujN9vIY3xOmpPYchZCki0Sx3IuQGqjy/0687x/MLmjMq+0U50Vr71 2+nA== MIME-Version: 1.0 Received: by 10.220.219.80 with SMTP id ht16mr5461646vcb.36.1338623028891; Sat, 02 Jun 2012 00:43:48 -0700 (PDT) Received: by 10.52.31.74 with HTTP; Sat, 2 Jun 2012 00:43:48 -0700 (PDT) X-Originating-IP: [188.253.136.98] Date: Sat, 2 Jun 2012 12:43:48 +0500 Message-ID: Subject: Deleting from SCF wide row makes node unresponsive From: Rustam Aliyev To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=14dae9cfc89420121204c178768a X-Gm-Message-State: ALoCoQnTNUcoTyLFXvcrZwTx0x1e+8i7MSSmV04asbo4EPUd8+UsM9r74x9EInz0wyLa+gJp4SRm X-Virus-Checked: Checked by ClamAV on apache.org --14dae9cfc89420121204c178768a Content-Type: text/plain; charset=ISO-8859-1 Hi all, I have SCF with ~250K rows. One of these rows is relatively large - it's a wide row (according to compaction logs) containing ~100.000 super columns and overall size of 1GB. Each super column has average size of 10K and ~10 sub columns. When I'm trying to delete ~90% of the columns in this particular row, Cassandra nodes which own this wide row (3 of 5, RF=3) quickly run out of the heap space. See logs from one of the hosts here: http://pastebin.com/raw.php?i=kwn7b3rP After that, all 3 nodes start flapping up/down and GC messages (like the one in the bottom of the pastebin above) appearing in the logs. Cassandra never repairs from this mode and the only way out if to "kill -9" and start again. On IRC it was suggested that it enters GC death spiral. I tried to throttle delete requests on the client side - sending batch of 100 delete requests each 500ms. So no more than 200 deletes/sec. But it didn't help. I can reduce it further to 100/sec, but I don't think it will help much. I delete millions of columns from other row in this SCF at the same rate and never have hit this problem. It only happens when I try to delete from this particular wide row. So right now I don't know how can I delete these columns. Any ideas? Many thanks, Rustam. --14dae9cfc89420121204c178768a Content-Type: text/html; charset=ISO-8859-1
Hi all,

I have SCF with ~250K rows. One of these rows is relatively large - it's a wide row (according to compaction logs) containing ~100.000 super columns and overall size of 1GB. Each super column has average size of 10K and ~10 sub columns.

When I'm trying to delete ~90% of the columns in this particular row, Cassandra nodes which own this wide row (3 of 5, RF=3) quickly run out of the heap space. See logs from one of the hosts here:

http://pastebin.com/raw.php?i=kwn7b3rP

After that, all 3 nodes start flapping up/down and GC messages (like the one in the bottom of the pastebin above) appearing in the logs. Cassandra never repairs from this mode and the only way out if to "kill -9" and start again. On IRC it was suggested that it enters GC death spiral.

I tried to throttle delete requests on the client side - sending batch of 100 delete requests each 500ms. So no more than 200 deletes/sec. But it didn't help. I can reduce it further to 100/sec, but I don't think it will help much.

I delete millions of columns from other row in this SCF at the same rate and never have hit this problem. It only happens when I try to delete from this particular wide row.

So right now I don't know how can I delete these columns. Any ideas?


Many thanks,
Rustam.
--14dae9cfc89420121204c178768a--