Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EA0B6D5A1 for ; Mon, 15 Oct 2012 20:43:52 +0000 (UTC) Received: (qmail 25262 invoked by uid 500); 15 Oct 2012 20:43:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25203 invoked by uid 500); 15 Oct 2012 20:43:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25191 invoked by uid 99); 15 Oct 2012 20:43:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Oct 2012 20:43:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of me@matthiasb.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Oct 2012 20:43:41 +0000 Received: by mail-la0-f44.google.com with SMTP id b11so4194312lam.31 for ; Mon, 15 Oct 2012 13:43:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=htOvOjKaBxAR5mpFbDuAy/Gl5CTj1fpz1ftav2qnzDQ=; b=EPPT9XtYiRURroT8w78JGSMvOxQpI32V4gqB6iijMaag9S1jkBMLAKly6GK2Po7ZiN ig//1y5LznNkwmPQ/xSv5eBQFBoOPGiYnpsWj1Gzhu2gWVH6v+FIrUh6F1HFTLSRmCzV oLweZakTa+zb6+3DvA+lwag5TZS1dgXbrsa2IlcQvc0CPw2tPcjNRfu1GwFaEmqo4WNH 65hODqqD9f0Lei7IIYa1EUhWxzhn791xFanMGGiD1UaeQwSYeenI2z6VDDvY6FJk5IIK CVhFFUI4RCmZywBQJin0nLAlIlmn620KCWYVp7JzpabK8Sy6w07Vt6ekoxOukkxAIcOP 2c+w== Received: by 10.152.124.83 with SMTP id mg19mr11071387lab.6.1350333799483; Mon, 15 Oct 2012 13:43:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.58.163 with HTTP; Mon, 15 Oct 2012 13:42:59 -0700 (PDT) X-Originating-IP: [178.15.239.84] From: Matthias Broecheler Date: Mon, 15 Oct 2012 13:42:59 -0700 Message-ID: Subject: RF update To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d042dfc957249de04cc1f167c X-Gm-Message-State: ALoCoQnerRnmXLi673RmUdcFIHdkzMiKIYTiCERpQSIJH4mt4OdJ8p6N727m25j2KX7tRAulvHrd X-Virus-Checked: Checked by ClamAV on apache.org --f46d042dfc957249de04cc1f167c Content-Type: text/plain; charset=ISO-8859-1 Hey, we are writing a lot of data into a cassandra cluster for a batch loading use case. We cannot use the sstable batch loader, so in order to speed up the loading process we are using RF=1 while the data is loading. After the load is complete, we want to increase the RF. For that, we are updating the RF in the schema and then run the node repair tool on each cassandra instance to stream the data over. However, we are noticing that this process is slowed down by a lot of compactions (the actually streaming of data only takes a couple of minutes). Cassandra is already running a major compaction after the data loading process has completed. But then, there are to be two more compactions (one on the sender and one on the receiver) happening and those take a very long time even on the aws high i/o instance with no compaction throttling. Question: These additional compactions seem redundant since there are no reads or writes on the cluster after the first major compaction (immediately after the data load), is that right? And if so, what can we do to avoid them? We are currently waiting multiple days. Thank you very much for your help, Matthias --f46d042dfc957249de04cc1f167c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey,

we are writing a lot of data into a cassandra clust= er for a batch loading use case. We cannot use the sstable batch loader, so= in order to speed up the loading process we are using RF=3D1 while the dat= a is loading. After the load is complete, we want to increase the RF. For t= hat, we are updating the RF in the schema and then run the node repair tool= on each cassandra instance to stream the data over. However, we are notici= ng that this process is slowed down by a lot of compactions (the actually s= treaming of data only takes a couple of minutes).

Cassandra is already running a major compaction after t= he data loading process has completed. But then, there are to be two more c= ompactions (one on the sender and one on the receiver) happening and those = take a very long time even on the aws high i/o instance with no compaction = throttling.=A0

Question: These additional compactions seem redundant s= ince there are no reads or writes on the cluster after the first major comp= action (immediately after the data load), is that right? And if so, what ca= n we do to avoid them? We are currently waiting multiple days.

Thank you very much for your help,
Matthias

--f46d042dfc957249de04cc1f167c--