Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D80B410041 for ; Thu, 4 Jul 2013 11:37:00 +0000 (UTC) Received: (qmail 31575 invoked by uid 500); 4 Jul 2013 11:36:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 31490 invoked by uid 500); 4 Jul 2013 11:36:58 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 31477 invoked by uid 99); 4 Jul 2013 11:36:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jul 2013 11:36:57 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michalm@opera.com designates 74.125.82.173 as permitted sender) Received: from [74.125.82.173] (HELO mail-we0-f173.google.com) (74.125.82.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jul 2013 11:36:51 +0000 Received: by mail-we0-f173.google.com with SMTP id x54so1030986wes.4 for ; Thu, 04 Jul 2013 04:36:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=gZ0oFg7relFCtTEs0+l8RkaiK1suIc1yWaiRtrdE6rE=; b=gyVojw7tTrsTuYrFC5rvIkQWuWAQh6PHzTOYVgE5UVTP8UuJ1hF+gFfJTmMxGUTxru xGh5nM+m9VbnhhCQ7qEwrdXgvUpihInLsaVPgw9wNOGzh+wtJPWoXC0NLhLgmqxvA+Qv DLHpw5BTkINeiGC/x3AHnV/REkhF8KJcRWE6Tz1lgzWeL8Ts+GMbjtUsH41NXd8Qxsh5 4wgKfq6ZVCM0D0BTrDoWi9dYHp03fSfx7zXRl4EgcyNAalO9q1/jpfdc+ryBs+Hy2APX KvR3AV+T2ADbSgXCsZNLiwo8Ud5KF1etv4PBwbeh2gt1qEurVSIih7EH1hmASqhKFDw5 q2WQ== X-Received: by 10.194.8.72 with SMTP id p8mr3327691wja.71.1372937791437; Thu, 04 Jul 2013 04:36:31 -0700 (PDT) Received: from ?IPv6:2a01:1120:1:170:2e41:38ff:fe9b:cca4? ([2a01:1120:1:170:2e41:38ff:fe9b:cca4]) by mx.google.com with ESMTPSA id ez3sm23642004wib.3.2013.07.04.04.36.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 04:36:30 -0700 (PDT) Message-ID: <51D55E3C.2030103@opera.com> Date: Thu, 04 Jul 2013 13:36:28 +0200 From: =?UTF-8?B?TWljaGHFgiBNaWNoYWxza2k=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: going down from RF=3 to RF=2, repair constantly falls over with JVM OOM References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Gm-Message-State: ALoCoQmCz7mv0YoFHJNaQ+6aEZbZd6Sl2Xlevf71QrNo4ZAaYyPHaNJV5+S80BZqJsAOTTwV/yTP X-Virus-Checked: Checked by ClamAV on apache.org I don't think you need to run repair if you decrease RF. At least I wouldn't do it. In case of *decreasing* RF have 3 nodes containing some data, but only 2 of them should store them from now on, so you should rather run cleanup, instead of repair, toget rid of the data on 3rd replica. And I guess it should work (in terms of disk space and memory), if you've been able to perform compaction. Repair makes sense if you *increase* RF, so the data are streamed to the new replicas. M. W dniu 04.07.2013 12:20, Evan Dandrea pisze: > Hi, > > We've made the mistake of letting our nodes get too large, now holding > about 3TB each. We ran out of enough free space to have a successful > compaction, and because we're on 1.0.7, enabling compression to get > out of the mess wasn't feasible. We tried adding another node, but we > think this may have put too much pressure on the existing ones it was > replicating from, so we backed out. > > So we decided to drop RF down to 2 from 3 to relieve the disk pressure > and started building a secondary cluster with lots of 1 TB nodes. We > ran repair -pr on each node, but it’s failing with a JVM OOM on one > node while another node is streaming from it for the final repair. > > Does anyone know what we can tune to get the cluster stable enough to > put it in a multi-dc setup with the secondary cluster? Do we actually > need to wait for these RF3->RF2 repairs to stabilize, or could we > point it at the secondary cluster without worry of data loss? > > We’ve set the heap on these two problematic nodes to 20GB, up from the > equally too high 12GB, but we’re still hitting OOM. I had seen in > other threads that tuning down compaction might help, so we’re trying > the following: > > in_memory_compaction_limit_in_mb 32 (down from 64) > compaction_throughput_mb_per_sec 8 (down from 16) > concurrent_compactors 2 (the nodes have 24 cores) > flush_largest_memtables_at 0.45 (down from 0.50) > stream_throughput_outbound_megabits_per_sec 300 (down from 400) > reduce_cache_sizes_at 0.5 (down from 0.6) > reduce_cache_capacity_to 0.35 (down from 0.4) > > -XX:CMSInitiatingOccupancyFraction=30 > > Here’s the log from the most recent repair failure: > > http://paste.ubuntu.com/5843017/ > > The OOM starts at line 13401. > > Thanks for whatever insight you can provide. >