From user-return-64037-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Thu Jun 13 12:52:31 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2267118064E for ; Thu, 13 Jun 2019 14:52:31 +0200 (CEST) Received: (qmail 27589 invoked by uid 500); 13 Jun 2019 12:52:27 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27578 invoked by uid 99); 13 Jun 2019 12:52:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Jun 2019 12:52:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C4978C2155 for ; Thu, 13 Jun 2019 12:52:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.153 X-Spam-Level: *** X-Spam-Status: No, score=3.153 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, JMQ_SPF_NEUTRAL=0.5, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=thelastpickle-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id f9WCgNJ5r5tv for ; Thu, 13 Jun 2019 12:52:25 +0000 (UTC) Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CC56B5F18A for ; Thu, 13 Jun 2019 12:52:24 +0000 (UTC) Received: by mail-qt1-f171.google.com with SMTP id x47so22312889qtk.11 for ; Thu, 13 Jun 2019 05:52:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thelastpickle-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=DSJhev5Avj0XOvry9UIWWnzwvrgiOJIwqfGmRSUIl0Y=; b=KYoQ611ncfN75GNWxMTAd+hcRzUZWFJ1T8BGym3WPTOk2cCmqZSfnq0wIBRKzqQnhR 3+HVjU3PiURj1dIwn4nhvvdxLnNdMih2k2S3q3NYj9qtTQPn6q73okiPVU847MItPJNN Zpz/7s5mldXXbF3oXreDFzYhMiS2OM+jto/eq3x1w9jlufSEV7ZnpkpB0jiPOLnKRE0b KIZ9TlFkgF44EN4Riuban280zVtGwLTV0m8Ew55IwcVIScHWcg2pggnoARGbTRBfSruI XOvMfAYNxz8llTNrnxxElVW/644FuhWUYkwVX7D8jV7ymZfJGfgQG3EKoR/oe08bGMUi WENw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=DSJhev5Avj0XOvry9UIWWnzwvrgiOJIwqfGmRSUIl0Y=; b=VYjJ8kEdwU+E7zqJegS1dTerW6Ff20fCqtmIt+u2OtmqpKKHfONGvBmx2I33d0NLG2 ljH+w+yimwxfOMgVClSu0wYct0glF9VAlyvN23FyhXeRq4WjxqEea8xJQHCJAnZw/PbM kJG11+hG0P3m2KWJdxxuBUHrk161Z3tRdnc1K1qSHpWEDOZHVWUw921eO0iBgczNhqnI WlEdAkrW9cn2JoANCCJJ7MtvReV7HVN7H8JSHqG/+BMYETYweJ8inzPPGVK+g6nySW8P jMRHSMv0Y7fvYwgRHzkoc6kipUHui2lphi3hu1aK1SVR9bXE+PnrO/GcwFPDcNg6LHjr p/Aw== X-Gm-Message-State: APjAAAXoJHuYUpu6276dtNNpf0E7m45hXbRHL/w+YVuxGN07gNts7imQ YxhZMqXKu91N7xd5CUd1IGvy2H/nSCfFyrM4Ti9dIk/TrV8iUA== X-Google-Smtp-Source: APXvYqwBELJCbHl832beNc2E/qfNab7Y1oysaREa+r1f0fNZjcy2yWq8BNwUUFfGZdoV2bgjxVXBjgHer/R6ekj8Wfc= X-Received: by 2002:ac8:2d69:: with SMTP id o38mr61477640qta.169.1560430344265; Thu, 13 Jun 2019 05:52:24 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Dejanovski Date: Thu, 13 Jun 2019 14:52:13 +0200 Message-ID: Subject: Re: Speed up compaction To: user Content-Type: multipart/alternative; boundary="0000000000005dfe8c058b3400c4" --0000000000005dfe8c058b3400c4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi L=C3=A9o, Major compactions in LCS (and minor as well) are very slow indeed and I'm afraid there's not much you can do to speed things up. There are lots of synchronized sections in the LCS code and it has to do a lot of comparisons between sstables to make sure a partition won't end up in two sstables of the same level. A major compaction will be single threaded for obvious reasons, and while this is happening you might have all the newly flushed SSTables that will pile up in S0 since I don't see how Cassandra could achieve the "one sstable per partition per level except L0" guarantee otherwise. At this point, your best chance might be to switch the table to STCS, run a major compaction using the "-s" flag (split output, which will create one SSTable per size tier instead of a big fat one) and then back to LCS, before or after your migration (whatever works best for you). If you go down that path, I'd also recommend to try it up on one node using JMX to alter the compaction strategy, run the major compaction with nodetool and see if it's indeed faster than the LCS major compaction. Then, proceed node by node using JMX (wait for the major compaction to go through between nodes) and alter the schema only after the last node has been switched to STCS. You can use more "aggressive" compaction settings to limit read fragmentation reducing max_threshold to 3 instead of 4 (the default). Note that doing all this will impact your cluster performance in ways I cannot predict, and should be attempted only if you really need to perform this major compaction and cannot wait for it to go through at the current pace. Cheers, ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com On Thu, Jun 13, 2019 at 2:07 PM L=C3=A9o FERLIN SUTTON wrote: > On Thu, Jun 13, 2019 at 12:09 PM Oleksandr Shulgin < > oleksandr.shulgin@zalando.de> wrote: > >> On Thu, Jun 13, 2019 at 11:28 AM L=C3=A9o FERLIN SUTTON >> wrote: >> >>> >>> ## Cassandra configuration : >>> 4 concurrent_compactors >>> Current compaction throughput: 150 MB/s >>> Concurrent reads/write are both set to 128. >>> >>> I have also temporarily stopped every repair operations. >>> >>> Any ideas about how I can speed this up ? >>> >> >> Hi, >> >> What is the compaction strategy used by this column family? >> >> Do you observe this behavior on one of the nodes only? Have you tried t= o >> cancel this compaction and see if a new one is started and makes better >> progress? Can you try to restart the affected node? >> >> Regards, >> -- >> Alex >> >> I can't believe I forgot that information. > > Overall we are talking about a 1.08TB table, using LCS. > > SSTable count: 1047 >> SSTables in each level: [15/4, 10, 103/100, 918, 0, 0, 0, 0, 0] > > SSTable Compression Ratio: 0.5192269874287099 > > Number of partitions (estimate): 7282253587 > > > We have recently (about a month ago) deleted about 25% of the data in tha= t > table. > > Letting Cassandra reclaim the disk space on it's own (via regular > compactions) was too slow for us, so we wanted to force a compaction on t= he > table to reclaim the disk space faster. > > The speed of the compaction doesn't seem out of the ordinary for the > cluster, only before we haven't had such a big compaction and the speed > alarmed us. > We never have a big compaction backlog, most of the time less than 5 > pending tasks (per node) > > Finally but we are running Cassandra 3.0.18 and plan to upgrade to 3.11 a= s > soon as our compactions are over. > > Regards, > > Leo > --0000000000005dfe8c058b3400c4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi L=C3=A9o,

Major compactions in LCS (= and minor as well) are very slow indeed and I'm afraid there's not = much you can do to speed things up. There are lots of synchronized sections= in the LCS code and it has to do a lot of comparisons between sstables to = make sure a partition won't end up in two sstables of the same level.
A major compaction will be single threaded for obvious reasons, an= d while this is happening you might have all the newly flushed SSTables tha= t will pile up in S0 since I don't see how Cassandra could achieve the = "one sstable per partition per level except L0" guarantee otherwi= se.

At this=C2=A0point, your best chance might be = to switch the table to STCS, run a major compaction using the "-s"= ; flag (split output, which will create one SSTable per size tier instead o= f a big fat one) and then back to LCS, before or after your migration (what= ever works best for you). If you go down that path, I'd also recommend = to try it up on one node using JMX to alter the compaction strategy, run th= e major compaction with nodetool and see if it's indeed faster than the= LCS major compaction. Then, proceed node by node using JMX (wait for the m= ajor compaction to go through between nodes) and alter the schema only afte= r the last node has been switched to STCS.=C2=A0
You can use more= "aggressive" compaction settings to limit read fragmentation red= ucing max_threshold=C2=A0to 3 instead of 4 (the default).
Note that doing all this will impact your cluster performance i= n ways I cannot predict, and should be attempted only if you really need to= perform this major compaction and cannot wait for it to go through at the = current pace.

Cheers,

-----------------
Alexander Dejanovski
France@alexanderdeja

Consultant
Apache Cassandra Consulting


On Thu, Jun 13, 2019 at 2:07 P= M L=C3=A9o FERLIN SUTTON <lferlin@mailjet.com.invalid> wrote:
On Thu, Jun 13, 2019 at 12:09 PM Oleksandr Shulgin <oleksandr.shulgi= n@zalando.de> wrote:
On= Thu, Jun 13, 2019 at 11:28 AM L=C3=A9o FERLIN SUTTON <lferlin@mailjet.c= om.invalid> wrote:

## Cassandr= a configuration :
4 concurrent_compactors
Current compaction throughp= ut: 150 MB/s
Concurrent reads/write are both set to 128.

<= /div>
I have also temporarily stopped every repair operations.

Any ideas about how I can speed this up ?
<= /div>

Hi,

What is = the compaction strategy used by this column family?

Do you observe this behavior on one of the nodes only?=C2=A0 Have you tri= ed to cancel this compaction and see if a new one is started and makes bett= er progress?=C2=A0 Can you try to restart the affected node?

=
Regards,
--
Alex

<= /div>
I can't believe I forgot that information.
=

=C2=A0Overall we are talking about a 1.08TB table, usin= g LCS.

SSTable Compression Ratio: 0.5192269874287099=C2=A0
Number of partitions (estimate): = 7282253587

We have recently (about a month = ago) deleted about 25% of the data in that table.

<= div>Letting Cassandra reclaim the disk space on it's own (via regular c= ompactions) was too slow for us, so we wanted to force a compaction on the = table to reclaim the disk space faster.

The speed = of the compaction doesn't seem out of the ordinary for the cluster, onl= y before we haven't had such a big compaction and the speed alarmed us.= =C2=A0=C2=A0
We never have a big compaction backlog, most of the = time less than 5 pending tasks (per node)

Finally = but we are running Cassandra 3.0.18 and plan to upgrade to 3.11 as soon as = our compactions are over.

Regards,

<= /div>
Leo
--0000000000005dfe8c058b3400c4--