Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of will@voodoolunchbox.com
 designates 209.85.210.172 as permitted sender)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1278)
Subject: Re: RF update
From: Will Martin <will@voodoolunchbox.com>
In-Reply-To: 
 <CAEsQWxoesptSP3+M5R32cUVXQNDYq7QnOSzHFA1Rpu54y90o_A@mail.gmail.com>
Date: Mon, 15 Oct 2012 21:32:30 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <868AB5E4-E058-4A16-BDDB-A5849A0155E9@voodoolunchbox.com>
References: 
 <CAEsQWxoesptSP3+M5R32cUVXQNDYq7QnOSzHFA1Rpu54y90o_A@mail.gmail.com>
To: user@cassandra.apache.org

+1   It doesn't make sense that the xfr compactions are heavy unless =
they are translating the file. This could be a protocol mismatch: =
however the requirements for node level compaction and wire compaction I =
would expect to be pretty different.
On Oct 15, 2012, at 4:42 PM, Matthias Broecheler wrote:

> Hey,
>=20
> we are writing a lot of data into a cassandra cluster for a batch =
loading use case. We cannot use the sstable batch loader, so in order to =
speed up the loading process we are using RF=3D1 while the data is =
loading. After the load is complete, we want to increase the RF. For =
that, we are updating the RF in the schema and then run the node repair =
tool on each cassandra instance to stream the data over. However, we are =
noticing that this process is slowed down by a lot of compactions (the =
actually streaming of data only takes a couple of minutes).
>=20
> Cassandra is already running a major compaction after the data loading =
process has completed. But then, there are to be two more compactions =
(one on the sender and one on the receiver) happening and those take a =
very long time even on the aws high i/o instance with no compaction =
throttling.=20
>=20
> Question: These additional compactions seem redundant since there are =
no reads or writes on the cluster after the first major compaction =
(immediately after the data load), is that right? And if so, what can we =
do to avoid them? We are currently waiting multiple days.
>=20
> Thank you very much for your help,
> Matthias
>=20