From user-return-64037-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Thu Jun 13 12:52:31 2019
Return-Path: <user-return-64037-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 2267118064E
	for <archive-asf-public@cust-asf.ponee.io>; Thu, 13 Jun 2019 14:52:31 +0200 (CEST)
Received: (qmail 27589 invoked by uid 500); 13 Jun 2019 12:52:27 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 27578 invoked by uid 99); 13 Jun 2019 12:52:27 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Jun 2019 12:52:27 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C4978C2155
	for <user@cassandra.apache.org>; Thu, 13 Jun 2019 12:52:26 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 3.153
X-Spam-Level: ***
X-Spam-Status: No, score=3.153 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2,
	JMQ_SPF_NEUTRAL=0.5, RCVD_IN_DNSWL_NONE=-0.0001,
	RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652,
	URIBL_BLOCKED=0.001] autolearn=disabled
Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key)
	header.d=thelastpickle-com.20150623.gappssmtp.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id f9WCgNJ5r5tv for <user@cassandra.apache.org>;
	Thu, 13 Jun 2019 12:52:25 +0000 (UTC)
Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CC56B5F18A
	for <user@cassandra.apache.org>; Thu, 13 Jun 2019 12:52:24 +0000 (UTC)
Received: by mail-qt1-f171.google.com with SMTP id x47so22312889qtk.11
        for <user@cassandra.apache.org>; Thu, 13 Jun 2019 05:52:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=thelastpickle-com.20150623.gappssmtp.com; s=20150623;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=DSJhev5Avj0XOvry9UIWWnzwvrgiOJIwqfGmRSUIl0Y=;
        b=KYoQ611ncfN75GNWxMTAd+hcRzUZWFJ1T8BGym3WPTOk2cCmqZSfnq0wIBRKzqQnhR
         3+HVjU3PiURj1dIwn4nhvvdxLnNdMih2k2S3q3NYj9qtTQPn6q73okiPVU847MItPJNN
         Zpz/7s5mldXXbF3oXreDFzYhMiS2OM+jto/eq3x1w9jlufSEV7ZnpkpB0jiPOLnKRE0b
         KIZ9TlFkgF44EN4Riuban280zVtGwLTV0m8Ew55IwcVIScHWcg2pggnoARGbTRBfSruI
         XOvMfAYNxz8llTNrnxxElVW/644FuhWUYkwVX7D8jV7ymZfJGfgQG3EKoR/oe08bGMUi
         WENw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=DSJhev5Avj0XOvry9UIWWnzwvrgiOJIwqfGmRSUIl0Y=;
        b=VYjJ8kEdwU+E7zqJegS1dTerW6Ff20fCqtmIt+u2OtmqpKKHfONGvBmx2I33d0NLG2
         ljH+w+yimwxfOMgVClSu0wYct0glF9VAlyvN23FyhXeRq4WjxqEea8xJQHCJAnZw/PbM
         kJG11+hG0P3m2KWJdxxuBUHrk161Z3tRdnc1K1qSHpWEDOZHVWUw921eO0iBgczNhqnI
         WlEdAkrW9cn2JoANCCJJ7MtvReV7HVN7H8JSHqG/+BMYETYweJ8inzPPGVK+g6nySW8P
         jMRHSMv0Y7fvYwgRHzkoc6kipUHui2lphi3hu1aK1SVR9bXE+PnrO/GcwFPDcNg6LHjr
         p/Aw==
X-Gm-Message-State: APjAAAXoJHuYUpu6276dtNNpf0E7m45hXbRHL/w+YVuxGN07gNts7imQ
	YxhZMqXKu91N7xd5CUd1IGvy2H/nSCfFyrM4Ti9dIk/TrV8iUA==
X-Google-Smtp-Source: APXvYqwBELJCbHl832beNc2E/qfNab7Y1oysaREa+r1f0fNZjcy2yWq8BNwUUFfGZdoV2bgjxVXBjgHer/R6ekj8Wfc=
X-Received: by 2002:ac8:2d69:: with SMTP id o38mr61477640qta.169.1560430344265;
 Thu, 13 Jun 2019 05:52:24 -0700 (PDT)
MIME-Version: 1.0
References: <CAOzBd95XeuVBYO7bgpZTk-Uw56_s-HP5e_Ey=f7ap_UrWG6oPg@mail.gmail.com>
 <CACACo5TCVeG12O2oKN5rPZ-H7fL0Zp0rjsA066n6jNQqoz3XAg@mail.gmail.com> <CAOzBd96UKGdNUBk4dARmjFMh6eKrR=JLdRoOwYz578xyUUCSPg@mail.gmail.com>
In-Reply-To: <CAOzBd96UKGdNUBk4dARmjFMh6eKrR=JLdRoOwYz578xyUUCSPg@mail.gmail.com>
From: Alexander Dejanovski <alex@thelastpickle.com>
Date: Thu, 13 Jun 2019 14:52:13 +0200
Message-ID: <CAHkQdMi4kAbn+TiDhoxKU-iGNPkvHNdbaCgaH0CbkLv_gQvnzw@mail.gmail.com>
Subject: Re: Speed up compaction
To: user <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary="0000000000005dfe8c058b3400c4"

--0000000000005dfe8c058b3400c4
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi L=C3=A9o,

Major compactions in LCS (and minor as well) are very slow indeed and I'm
afraid there's not much you can do to speed things up. There are lots of
synchronized sections in the LCS code and it has to do a lot of comparisons
between sstables to make sure a partition won't end up in two sstables of
the same level.
A major compaction will be single threaded for obvious reasons, and while
this is happening you might have all the newly flushed SSTables that will
pile up in S0 since I don't see how Cassandra could achieve the "one
sstable per partition per level except L0" guarantee otherwise.

At this point, your best chance might be to switch the table to STCS, run a
major compaction using the "-s" flag (split output, which will create one
SSTable per size tier instead of a big fat one) and then back to LCS,
before or after your migration (whatever works best for you). If you go
down that path, I'd also recommend to try it up on one node using JMX to
alter the compaction strategy, run the major compaction with nodetool and
see if it's indeed faster than the LCS major compaction. Then, proceed node
by node using JMX (wait for the major compaction to go through between
nodes) and alter the schema only after the last node has been switched to
STCS.
You can use more "aggressive" compaction settings to limit read
fragmentation reducing max_threshold to 3 instead of 4 (the default).

Note that doing all this will impact your cluster performance in ways I
cannot predict, and should be attempted only if you really need to perform
this major compaction and cannot wait for it to go through at the current
pace.

Cheers,

-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 13, 2019 at 2:07 PM L=C3=A9o FERLIN SUTTON
<lferlin@mailjet.com.invalid> wrote:

> On Thu, Jun 13, 2019 at 12:09 PM Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
>> On Thu, Jun 13, 2019 at 11:28 AM L=C3=A9o FERLIN SUTTON
>> <lferlin@mailjet.com.invalid> wrote:
>>
>>>
>>> ## Cassandra configuration :
>>> 4 concurrent_compactors
>>> Current compaction throughput: 150 MB/s
>>> Concurrent reads/write are both set to 128.
>>>
>>> I have also temporarily stopped every repair operations.
>>>
>>> Any ideas about how I can speed this up ?
>>>
>>
>> Hi,
>>
>> What is the compaction strategy used by this column family?
>>
>> Do you observe this behavior on one of the nodes only?  Have you tried t=
o
>> cancel this compaction and see if a new one is started and makes better
>> progress?  Can you try to restart the affected node?
>>
>> Regards,
>> --
>> Alex
>>
>> I can't believe I forgot that information.
>
>  Overall we are talking about a 1.08TB table, using LCS.
>
> SSTable count: 1047
>> SSTables in each level: [15/4, 10, 103/100, 918, 0, 0, 0, 0, 0]
>
> SSTable Compression Ratio: 0.5192269874287099
>
> Number of partitions (estimate): 7282253587
>
>
> We have recently (about a month ago) deleted about 25% of the data in tha=
t
> table.
>
> Letting Cassandra reclaim the disk space on it's own (via regular
> compactions) was too slow for us, so we wanted to force a compaction on t=
he
> table to reclaim the disk space faster.
>
> The speed of the compaction doesn't seem out of the ordinary for the
> cluster, only before we haven't had such a big compaction and the speed
> alarmed us.
> We never have a big compaction backlog, most of the time less than 5
> pending tasks (per node)
>
> Finally but we are running Cassandra 3.0.18 and plan to upgrade to 3.11 a=
s
> soon as our compactions are over.
>
> Regards,
>
> Leo
>

--0000000000005dfe8c058b3400c4
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi L=C3=A9o,<div><br></div><div>Major compactions in LCS (=
and minor as well) are very slow indeed and I&#39;m afraid there&#39;s not =
much you can do to speed things up. There are lots of synchronized sections=
 in the LCS code and it has to do a lot of comparisons between sstables to =
make sure a partition won&#39;t end up in two sstables of the same level.</=
div><div>A major compaction will be single threaded for obvious reasons, an=
d while this is happening you might have all the newly flushed SSTables tha=
t will pile up in S0 since I don&#39;t see how Cassandra could achieve the =
&quot;one sstable per partition per level except L0&quot; guarantee otherwi=
se.</div><div><br></div><div>At this=C2=A0point, your best chance might be =
to switch the table to STCS, run a major compaction using the &quot;-s&quot=
; flag (split output, which will create one SSTable per size tier instead o=
f a big fat one) and then back to LCS, before or after your migration (what=
ever works best for you). If you go down that path, I&#39;d also recommend =
to try it up on one node using JMX to alter the compaction strategy, run th=
e major compaction with nodetool and see if it&#39;s indeed faster than the=
 LCS major compaction. Then, proceed node by node using JMX (wait for the m=
ajor compaction to go through between nodes) and alter the schema only afte=
r the last node has been switched to STCS.=C2=A0</div><div>You can use more=
 &quot;aggressive&quot; compaction settings to limit read fragmentation red=
ucing max_threshold=C2=A0to 3 instead of 4 (the default).<br></div><div><br=
></div><div>Note that doing all this will impact your cluster performance i=
n ways I cannot predict, and should be attempted only if you really need to=
 perform this major compaction and cannot wait for it to go through at the =
current pace.</div><div><br></div><div>Cheers,</div><div><br></div><div><di=
v><div dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signat=
ure"><div dir=3D"ltr">-----------------<br>Alexander Dejanovski<br>France<b=
r>@alexanderdeja<br><br>Consultant<br>Apache Cassandra Consulting<div style=
=3D"color:rgb(136,136,136);font-family:&quot;helvetica neue&quot;,helvetica=
,arial,sans-serif;line-height:19.5px"><a href=3D"http://www.thelastpickle.c=
om/" style=3D"color:rgb(17,85,204)" target=3D"_blank">http://www.thelastpic=
kle.com</a></div></div></div></div><br></div></div><br><div class=3D"gmail_=
quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jun 13, 2019 at 2:07 P=
M L=C3=A9o FERLIN SUTTON &lt;lferlin@mailjet.com.invalid&gt; wrote:<br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div d=
ir=3D"ltr">On Thu, Jun 13, 2019 at 12:09 PM Oleksandr Shulgin &lt;<a href=
=3D"mailto:oleksandr.shulgin@zalando.de" target=3D"_blank">oleksandr.shulgi=
n@zalando.de</a>&gt; wrote:<br></div><div class=3D"gmail_quote"><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">On=
 Thu, Jun 13, 2019 at 11:28 AM L=C3=A9o FERLIN SUTTON &lt;lferlin@mailjet.c=
om.invalid&gt; wrote:<br></div><div class=3D"gmail_quote"><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><div><br>## Cassandr=
a configuration :<br>4 concurrent_compactors<br>Current compaction throughp=
ut: 150 MB/s<br>Concurrent reads/write are both set to 128.</div><div><br><=
/div><div>I have also temporarily stopped every repair operations.</div><di=
v><br></div><div>Any ideas about how I can speed this up ?<br></div></div><=
/div></blockquote><div><br></div><div>Hi,</div><div><br></div><div>What is =
the compaction strategy used by this column family?</div><div><br></div><di=
v>Do you observe this behavior on one of the nodes only?=C2=A0 Have you tri=
ed to cancel this compaction and see if a new one is started and makes bett=
er progress?=C2=A0 Can you try to restart the affected node?</div><div><br>=
</div><div>Regards,</div><div>--</div><div>Alex</div><div><br></div></div><=
/div></blockquote><div>I can&#39;t believe I forgot that information.</div>=
<div><br></div><div>=C2=A0Overall we are talking about a 1.08TB table, usin=
g LCS.</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">SSTable count: 1047<br>		SSTables in each level: [15/4, 10, 103/100, 918,=
 0, 0, 0, 0, 0]</blockquote><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x">SSTable Compression Ratio: 0.5192269874287099=C2=A0</blockquote><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px=
 solid rgb(204,204,204);padding-left:1ex">Number of partitions (estimate): =
7282253587</blockquote><div><br></div><div>We have recently (about a month =
ago) deleted about 25% of the data in that table.<br></div><div><br></div><=
div>Letting Cassandra reclaim the disk space on it&#39;s own (via regular c=
ompactions) was too slow for us, so we wanted to force a compaction on the =
table to reclaim the disk space faster.</div><div><br></div><div>The speed =
of the compaction doesn&#39;t seem out of the ordinary for the cluster, onl=
y before we haven&#39;t had such a big compaction and the speed alarmed us.=
=C2=A0=C2=A0</div><div>We never have a big compaction backlog, most of the =
time less than 5 pending tasks (per node)</div><div><br></div><div>Finally =
but we are running Cassandra 3.0.18 and plan to upgrade to 3.11 as soon as =
our compactions are over.</div><div><br></div><div>Regards,</div><div><br><=
/div><div>Leo</div></div></div>
</blockquote></div>

--0000000000005dfe8c058b3400c4--