Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 889F0200D69 for ; Wed, 13 Dec 2017 00:48:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8701D160C10; Tue, 12 Dec 2017 23:48:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5834A160C00 for ; Wed, 13 Dec 2017 00:48:46 +0100 (CET) Received: (qmail 97351 invoked by uid 500); 12 Dec 2017 23:48:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 97338 invoked by uid 99); 12 Dec 2017 23:48:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Dec 2017 23:48:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2D573C52FA for ; Tue, 12 Dec 2017 23:48:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.198 X-Spam-Level: *** X-Spam-Status: No, score=3.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_LINEPADDING=1.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=instaclustr-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 5qMlyxcJzrV0 for ; Tue, 12 Dec 2017 23:48:41 +0000 (UTC) Received: from mail-yb0-f172.google.com (mail-yb0-f172.google.com [209.85.213.172]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C135B5F3B7 for ; Tue, 12 Dec 2017 23:48:40 +0000 (UTC) Received: by mail-yb0-f172.google.com with SMTP id k2so313757ybd.2 for ; Tue, 12 Dec 2017 15:48:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=instaclustr-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=mzK89iftmy6P7/GUdbGXEAm1lrjPq5+9UD6OuamjJw0=; b=hN3caNC1jGe/9HfOMhDc8N3TsyXif0MnSbG/VhEWhwPa4qUwEp7aSqajLe4JUXpsWw qY3kt/4acG6IqYlj4hZOA8HpqZs+yByQWRxfOx7EU75wqGe840KBnOADF+FRF1ZFQsMk Opl2uAILRP3wyc+GjXwcPTEEC+jIdCIRvnO6Gm0E/IAx4ecA4ddeYRgW3z9gXherZvWc PXW2idSjktPNckoBFUCgq09dHyxndF0nDanvRpfLEfhn15kdYg+HgxNmAmihYUoaHlAg 8sWcXGFAjtwG/0fwPHWOEgHfkPvnK/0UKdI2AOwUxE1ph+MkLBFEyKCSNDfeXS10t+S2 Dtkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=mzK89iftmy6P7/GUdbGXEAm1lrjPq5+9UD6OuamjJw0=; b=N6I7k6rANSVoeqNavoq84kqNyfLziwkxxhLLjx4e/zvFIoa3qSHt00Ue8/hChI4/EH ql/pAjE4WWCKeuGCADB1jqQGmcp+vZ6LerPHTIw2/PdE1EU6FsOZrreqckOVASS4KeKH 4d5jcNkL+BxQbq89xVSDLfcPencSPmT5yqnprdMEo2IBV3qdOM/6QlX0I1Tk/WqcSaiM rDozWjeRQCr/jqjGBHsJBUS9DvkIivDI78d1w4hVWsDX+e0ckl3GwB1c5W9jMbzqn4oL H8Cr4QUQoTkF78vELP+HgjwNkCTkFwcLjeLWsz74hXpzUPzFrdqzQ3zsIlneN9jNSkqh vzIw== X-Gm-Message-State: AKGB3mIAsmi89uWSkiOtKtGqPaNu4+9qHDjFk+uqRJ8m4YexnEDS6dTM abKc8ThGWyLgebTjZZ4MLogoDXcKg92wkvv1GdCAn/ox X-Google-Smtp-Source: ACJfBotSMrd1wA+RwXpDo9wudBpcYoa66l8F/cm33IAOBgN2JrPQXllFEv9HUwYeZ5rwdnlYcUhm3CJnycy3NbG9Z/c= X-Received: by 10.37.189.7 with SMTP id f7mr470250ybk.193.1513122520135; Tue, 12 Dec 2017 15:48:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.129.83.67 with HTTP; Tue, 12 Dec 2017 15:47:59 -0800 (PST) In-Reply-To: References: <68E9A35B-994B-4456-B13C-3147DEACA25D@gmail.com> From: kurt greaves Date: Tue, 12 Dec 2017 23:47:59 +0000 Message-ID: Subject: Re: Tombstoned data seems to remain after compaction To: User Content-Type: multipart/alternative; boundary="089e0828f0f050929605602d4a3d" archived-at: Tue, 12 Dec 2017 23:48:47 -0000 --089e0828f0f050929605602d4a3d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable As long as you've limited the throughput of compactions you should be fine (by default it's 16mbps, this can be changed through nodetool setcompactionthroughput or in the yaml) - it will be no different to any other compaction occuring, the compaction will just take longer. You should be aware however that a major compaction will use up to double the disk space currently utilised by that table. Considering you've got lots of tombstones it will probably be a lot less than double, but it will still be significant, so ensure you have enough free space for the compaction to complete. On 12 December 2017 at 07:44, taka-t@fujitsu.com wrote= : > Hi Jeff, Kurt > > > > > > Thanks again for your advice. > > > > Within those valuable ideas you provide, I think of executing nodetool > compact > > because it is the most simplest way to try and I=E2=80=99m really novice = about > Cassandra. > > > > One thing I=E2=80=99m concerned about the plan is that the major compacti= on might > > have a serious impact on our production system, that use Cassandra as > storage for > > data cache for web session or something like that. > > > > We use the Cassandra ring with three node. And Replicates to all 3 nodes, > using > > QUORUM consistency level on data update. > > > > Under such condition above, Are there any risks if I execute Major > compaction > > to each nodes one by one? The whole system=E2=80=99s throughput seriously= get worse > > for example? > > > > I know I=E2=80=99m asking difficult question because those impact should = differ > depending > > their each situation, but advices on common belief of you are highly > appreciated! > > > > > > > > > > Regards, > > Takashima > > > > > > *From:* Jeff Jirsa [mailto:jjirsa@gmail.com] > *Sent:* Tuesday, December 12, 2017 2:35 AM > *To:* cassandra > *Subject:* Re: Tombstoned data seems to remain after compaction > > > > Hello Takashima, > > > > Answers inline. > > > > On Sun, Dec 10, 2017 at 11:41 PM, taka-t@fujitsu.com > wrote: > > Hi Jeff > > > > > > I=E2=80=99m appreciate for your detailed explanation :) > > > > > > =C3=98 Expired data gets purged on compaction as long as it doesn=E2=80= =99t overlap > with other live data. The overlap thing can be difficult to reason about, > but it=E2=80=99s meant to ensure correctness in the event that you write = a value > with ttl 180, then another value with ttl 1, and you don=E2=80=99t want t= o remove > the value with ttl1 until you=E2=80=99ve also removed the value with ttl1= 80, > since it would lead to data being resurrected > > > > I understand that TTL setting sometimes does not work as we expect, > especially when we alter the > > value afterword because of the Cassandra=E2=80=99s data consistency > functionalities. My understanding is > > correct? > > > > > > If "does not work as you expect" you mean "data is not cleared immediatel= y > upon expiration", that is correct. > > > > > > And I think of trying sstablesplit utility to let the Cassandra do minor > compaction because one of > > SSTables, which is oldest and very large so I want to compact it. > > > > That is offline and requires downtime, which is usually not something you > want to do if you can avoid it. > > > > Instead, I recommend you consider the tombstone compaction subproperties > to compaction, which let you force single sstable comapctions based on > tombstone percentage (and set that low enough that it reclaims the space > you want to reclaim). > > > > Perhaps counterintuitively, compaction is most effective at freeing up > space when it makes one very big file, compared to lots of little files - > sstablesplit is probably not a good idea. A major compaction may help, if > you have the extra IO and disk space. > > > > Again, though, you should probably consider using something other than > STCS going forward. > > > --089e0828f0f050929605602d4a3d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
As long as you've limited the throughput of compaction= s you should be fine (by default it's 16mbps, this can be changed throu= gh nodetool setcompactionthroughput or in the yaml) - it will be no differe= nt to any other compaction occuring, the compaction will just take longer. = You should be aware however that a major compaction will use up to double t= he disk space currently utilised by that table. Considering you've got = lots of tombstones it will probably be a lot less than double, but it will = still be significant, so ensure you have enough free space for the compacti= on to complete.

On 12 December 2017 at 07:44, tak= a-t@fujitsu.com <taka-t@fujitsu.com> wrote:

Hi Jeff, Kurt<= /u>

=C2=A0

=C2=A0

Thanks again for your ad= vice.

=C2=A0

Within those valuable id= eas you provide, I think of executing nodetool compact=

because it is the most s= implest way to try and I=E2=80=99m really novice about Cassandra.=

=C2=A0

One thing I=E2=80=99m co= ncerned about the plan is that the major compaction might

have a serious impact on= our production system, that use Cassandra as storage for

data cache for web sessi= on or something like that.

=C2=A0

We use the Cassandra rin= g with three node. And Replicates to all 3 nodes, using

QUORUM consistency level= on data update.

=C2=A0

Under such condition abo= ve, Are there any risks if I execute Major compaction<= /p>

to each nodes one by one= ? The whole system=E2=80=99s throughput seriously get worse

for example?

=C2=A0

I know I=E2=80=99m askin= g difficult question because those impact should differ depending=

their each situation, bu= t advices on common belief of you are highly appreciated!

=C2=A0

=C2=A0

=C2=A0

=C2=A0

Regards,

Takashima<= /span>

=C2=A0

=C2=A0

From: = Jeff Jirsa [mailto:jj= irsa@gmail.com]
Sent: Tuesday, December 12, 2017 2:35 AM
To: cassandra <user@cassandra.apache.org>
Subject: Re: Tombstoned data seems to remain after compaction=

=C2=A0

Hello Takashima,<= /span>

=C2=A0

Answers inline.=C2=A0=

=C2=A0

On Sun, Dec 10, 2017 at 11:41 P= M, taka-t@fujitsu.com <taka-t@fujitsu.com> wrote:

Hi Jeff

=C2=A0

=C2=A0

I=E2=80=99m appreciate f= or your detailed explanation :)

=C2=A0

=C2=A0

=C3=98=C2=A0 Expired data gets purged on compaction as long = as it doesn=E2=80=99t overlap with other live d= ata. The overlap thing can be difficult to reason about, but it=E2= =80=99s meant to ensure correctness in the event that you write a value with ttl 180, then another value with = ttl 1, and you don=E2=80=99t want to remove the= value with ttl1 until you=E2=80=99ve also remo= ved the value with ttl180, since it would lead to data being resurrected

=C2=A0

I understand that TTL se= tting sometimes does not work as we expect, especially when we alter the

value afterword because = of the Cassandra=E2=80=99s data consistency functionalities. My understandi= ng is

correct?

=C2=A0

=C2=A0

If "does not work as you e= xpect" you mean "data is not cleared immediately upon expiration&= quot;, that is correct.=C2=A0

=C2=A0

=C2=A0

And I think of trying ss= tablesplit utility to let the Cassandra do minor compaction because one of

SSTables, which is oldes= t and very large so I want to compact it.

=C2=A0

That is offline and requires do= wntime, which is usually not something you want to do if you can avoid it.<= u>

=C2=A0

Instead, I recommend you consid= er the tombstone compaction subproperties to compaction, which let you forc= e single sstable comapctions based on tombstone percentage (and set that lo= w enough that it reclaims the space you want to reclaim).

=C2=A0

Perhaps counterintuitively, com= paction is most effective at freeing up space when it makes one very big fi= le, compared to lots of little files - sstablesplit is probably not a good = idea. A major compaction may help, if you have the extra IO and disk space.

=C2=A0

Again, though, you should proba= bly consider using something other than STCS going forward.=C2=A0=

=C2=A0


--089e0828f0f050929605602d4a3d--