Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 8 Jun 2017 07:13:18 +0000 (UTC)
From: "Pedro Gordo (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12989371.1468481435000.25602.1496905998184@Atlassian.JIRA>
In-Reply-To: <JIRA.12989371.1468481435000@Atlassian.JIRA>
References: <JIRA.12989371.1468481435000@Atlassian.JIRA> <JIRA.12989371.1468481435705@jira-lw-us.apache.org>
Subject: [jira] [Updated] (CASSANDRA-12201) Burst Hour Compaction Strategy
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 08 Jun 2017 07:13:25 -0000


     [ https://issues.apache.org/jira/browse/CASSANDRA-12201?page=3Dcom.atl=
assian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pedro Gordo updated CASSANDRA-12201:
------------------------------------
    Description:=20
This strategy motivation revolves around taking advantage of periods of the=
 day where there's less I/O on the cluster. This time of the day will be ca=
lled =E2=80=9CBurst Hour=E2=80=9D (BH), and hence the strategy will be name=
d =E2=80=9CBurst Hour Compaction Strategy=E2=80=9D (BHCS).=20
The following process would be fired during BH:

1. Read all the SSTables and detect which partition keys are present in mor=
e than the compaction minimum threshold value.

2. Gather all the tables that have keys present in other tables, with a min=
imum of replicas equal to the minimum compaction threshold.=20

3. Repeat step 2 until the bucket for gathered SSTables reaches the maximum=
 compaction threshold (32 by default), or until we've searched all the keys=
.

4. The compaction per se will be done through by MaxSSTableSizeWriter. The =
compacted tables will have a maximum size equal to the configurable value o=
f max_sstable_size.=20

The maximum compaction task (nodetool compact command), does exactly the sa=
me operation as the background compaction task, but differing in that it ca=
n be triggered outside of the Burst Hour.

This strategy tries to address three issues of the existing compaction stra=
tegies:
- Due to max_sstable_size_limit, there's no need to reserve disc space for =
a huge compaction.
- The number of SSTables that we need to read from to reply to a read query=
 will be consistently maintained at a low level and controllable through th=
e referenced_sstable_limit property.
- It removes the dependency of a continuous high I/O.

Possible future improvements:
- Continuously evaluate how many pending compactions we have and I/O status=
, and then based on that, we start (or not) the compaction.
- If during the day, the size for all the SSTables in a family set reaches =
a certain maximum, then background compaction can occur anyway. This maximu=
m should be elevated due to the high CPU usage of BHCS.

  was:
Although it may be subject to changes, for the moment I plan to create a st=
rategy that will revolve around taking advantage of periods of the day wher=
e there's less I/O on the cluster. This time of the day will be called =E2=
=80=9CBurst Hour=E2=80=9D (BH), and hence the strategy will be named =E2=80=
=9CBurst Hour Compaction Strategy=E2=80=9D (BHCS).=20
The following process would be fired during BH:

1. Read all the SSTables and detect which partition keys are present in mor=
e than a configurable value which I'll call referenced_sstable_limit. This =
value will be three by default.

2. Group all the repeated keys with a reference to the SSTables containing =
them.

3. Calculate the total size of the SSTables which will be merged for the fi=
rst partition key on the list created in step 2. If the size calculated is =
bigger than property which I'll call max_sstable_size (also configurable), =
more than one table will be created in step 4.

4. During the merge, the data will be streamed from SSTables up to a point =
when we have a size close to max_sstable_size. After we reach this point, t=
he stream is paused, and the new SSTable will be closed, becoming immutable=
. Repeat the streaming process until we've merged all tables for the partit=
ion key that we're iterating.

5. Cycle through the rest of the collection created in step 2 and remove an=
y SSTables which don't exist anymore because they were merged in step 5. An=
 alternative course of action here would be to, instead of removing the SST=
able from the collection, to change its reference to the SSTable(s) which w=
as created in step 5.=20

6. Repeat from step 3 to step 6 until we traversed the entirety of the coll=
ection created in step 2.


This strategy addresses three issues of the existing compaction strategies:
- Due to max_sstable_size_limit, there's no need to reserve disc space for =
a huge compaction, as it can happen on STCS.
- The number of SSTables that we need to read from to reply to a read query=
 will be consistently maintained at a low level and controllable through th=
e referenced_sstable_limit property. This addresses the scenario of STCS wh=
en we might have to read from a lot of SSTables.
- It removes the dependency of a continuous high I/O of LCS.


> Burst Hour Compaction Strategy
> ------------------------------
>
>                 Key: CASSANDRA-12201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1220=
1
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Pedro Gordo
>   Original Estimate: 1,008h
>  Remaining Estimate: 1,008h
>
> This strategy motivation revolves around taking advantage of periods of t=
he day where there's less I/O on the cluster. This time of the day will be =
called =E2=80=9CBurst Hour=E2=80=9D (BH), and hence the strategy will be na=
med =E2=80=9CBurst Hour Compaction Strategy=E2=80=9D (BHCS).=20
> The following process would be fired during BH:
> 1. Read all the SSTables and detect which partition keys are present in m=
ore than the compaction minimum threshold value.
> 2. Gather all the tables that have keys present in other tables, with a m=
inimum of replicas equal to the minimum compaction threshold.=20
> 3. Repeat step 2 until the bucket for gathered SSTables reaches the maxim=
um compaction threshold (32 by default), or until we've searched all the ke=
ys.
> 4. The compaction per se will be done through by MaxSSTableSizeWriter. Th=
e compacted tables will have a maximum size equal to the configurable value=
 of max_sstable_size.=20
> The maximum compaction task (nodetool compact command), does exactly the =
same operation as the background compaction task, but differing in that it =
can be triggered outside of the Burst Hour.
> This strategy tries to address three issues of the existing compaction st=
rategies:
> - Due to max_sstable_size_limit, there's no need to reserve disc space fo=
r a huge compaction.
> - The number of SSTables that we need to read from to reply to a read que=
ry will be consistently maintained at a low level and controllable through =
the referenced_sstable_limit property.
> - It removes the dependency of a continuous high I/O.
> Possible future improvements:
> - Continuously evaluate how many pending compactions we have and I/O stat=
us, and then based on that, we start (or not) the compaction.
> - If during the day, the size for all the SSTables in a family set reache=
s a certain maximum, then background compaction can occur anyway. This maxi=
mum should be elevated due to the high CPU usage of BHCS.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org