Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CAAYAZAc+oFRHbzAGn5p5Zs5bZFWJ1H4twiuU=TrXn9eq4RtP3Q@mail.gmail.com>
References: <CACiW-BjCwPyNvDt4wk0O0jHsPpVmH1rvFJwfv73GL7N+graLXA@mail.gmail.com>
 <CA+VSrLo3d=yv8FUPQ=3YHfdMsGxvUDnJyiVatudzE31WGwLWuQ@mail.gmail.com>
 <FE31FFB6-9707-421A-A73E-BF5FEF67C73A@ncbi.nlm.nih.gov> <CAAYAZAc+oFRHbzAGn5p5Zs5bZFWJ1H4twiuU=TrXn9eq4RtP3Q@mail.gmail.com>
From: Chris Mawata <chris.mawata@gmail.com>
Date: Sun, 15 Jan 2017 06:21:44 -0500
Message-ID: <CAEpEg_CQ3tm-cWrWs8bUXtptjSckKd4ykR4Znb5ea04idn1nEQ@mail.gmail.com>
Subject: Re: Backups eating up disk space
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=94eb2c03750ab2aa4905462045c1
archived-at: Sun, 15 Jan 2017 11:22:01 -0000

--94eb2c03750ab2aa4905462045c1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

You don't have a viable solution because you are not making a snapshot as a
starting point. After a while you will have a lot of backup data.  Using
the backups to get your cluster to a given state will involve copying a
very large amount of backup data, possibility more than the capacity of
your cluster followed by a tremendous amount of compaction. If your
topology changes life could really get miserable. I would counsel having
period snapshots so that your possible bad day in the future is less bad.
On Jan 13, 2017 8:01 AM, "Kunal Gangakhedkar" <kgangakhedkar@gmail.com>
wrote:

> Great, thanks a lot to all for the help :)
>
> I finally took the dive and went with Razi's suggestions.
> In summary, this is what I did:
>
>    - turn off incremental backups on each of the nodes in rolling fashion
>    - remove the 'backups' directory from each keyspace on each node.
>
> This ended up freeing up almost 350GB on each node - yay :)
>
> Again, thanks a lot for the help, guys.
>
> Kunal
>
> On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
> raziuddin.khaja@nih.gov> wrote:
>
>> snapshots are slightly different than backups.
>>
>>
>>
>> In my explanation of the hardlinks created in the backups folder, notice
>> that compacted sstables, never end up in the backups folder.
>>
>>
>>
>> On the other hand, a snapshot is meant to represent the data at a
>> particular moment in time. Thus, the snapshots directory contains hardli=
nks
>> to all active sstables at the time the snapshot was taken, which would
>> include: compacted sstables; and any sstables from memtable flush or
>> streamed from other nodes that both exist in the table directory and the
>> backups directory.
>>
>>
>>
>> So, that would be the difference between snapshots and backups.
>>
>>
>>
>> Best regards,
>>
>> -Razi
>>
>>
>>
>>
>>
>> *From: *Alain RODRIGUEZ <arodrime@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Thursday, January 12, 2017 at 9:16 AM
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> My 2 cents,
>>
>>
>>
>> As I mentioned earlier, we're not currently using snapshots - it's only
>> the backups that are bothering me right now.
>>
>>
>>
>> I believe backups folder is just the new name for the previously called
>> snapshots folder. But I can be completely wrong, I haven't played that m=
uch
>> with snapshots in new versions yet.
>>
>>
>>
>> Anyway, some operations in Apache Cassandra can trigger a snapshot:
>>
>>
>>
>> - Repair (when not using parallel option but sequential repairs instead)
>>
>> - Truncating a table (by default)
>>
>> - Dropping a table (by default)
>>
>> - Maybe other I can't think of... ?
>>
>>
>>
>> If you want to clean space but still keep a backup you can run:
>>
>>
>>
>> "nodetool clearsnapshots"
>>
>> "nodetool snapshot <whatever>"
>>
>>
>>
>> This way and for a while, data won't be taking space as old files will b=
e
>> cleaned and new files will be only hardlinks as detailed above. Then you
>> might want to work at a proper backup policy, probably implying getting
>> data out of production server (a lot of people uses S3 or similar
>> services). Or just do that from time to time, meaning you only keep a
>> backup and disk space behaviour will be hard to predict.
>>
>>
>>
>> C*heers,
>>
>> -----------------------
>>
>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>>
>> France
>>
>>
>>
>> The Last Pickle - Apache Cassandra Consulting
>>
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar <prasenjit.sarkar@datos.io>:
>>
>> Hi Kunal,
>>
>>
>>
>> Razi's post does give a very lucid description of how cassandra manages
>> the hard links inside the backup directory.
>>
>>
>>
>> Where it needs clarification is the following:
>>
>> --> incremental backups is a system wide setting and so its an all or
>> nothing approach
>>
>>
>>
>> --> as multiple people have stated, incremental backups do not create
>> hard links to compacted sstables. however, this can bloat the size of yo=
ur
>> backups
>>
>>
>>
>> --> again as stated, it is a general industry practice to place backups
>> in a different secondary storage location than the main production site.=
 So
>> best to move it to the secondary storage before applying rm on the backu=
ps
>> folder
>>
>>
>>
>> In my experience with production clusters, managing the backups folder
>> across multiple nodes can be painful if the objective is to ever recover
>> data. With the usual disclaimers, better to rely on third party vendors =
to
>> accomplish the needful rather than scripts/tablesnap.
>>
>>
>>
>> Regards
>>
>> Prasenjit
>>
>>
>>
>>
>>
>> On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
>> raziuddin.khaja@nih.gov> wrote:
>>
>> Hello Kunal,
>>
>>
>>
>> Caveat: I am not a super-expert on Cassandra, but it helps to explain to
>> others, in order to eventually become an expert, so if my explanation is
>> wrong, I would hope others would correct me. J
>>
>>
>>
>> The active sstables/data files are are all the files located in the
>> directory for the table.
>>
>> You can safely remove all files under the backups/ directory and the
>> directory itself.
>>
>> Removing any files that are current hard-links inside backups won=E2=80=
=99t cause
>> any issues, and I will explain why.
>>
>>
>>
>> Have you looked at your Cassandra.yaml file and checked the setting for
>> incremental_backups?  If it is set to true, and you don=E2=80=99t want t=
o make new
>> backups, you can set it to false, so that after you clean up, you will n=
ot
>> have to clean up the backups again.
>>
>>
>>
>> Explanation:
>>
>> Lets look at the the definition of incremental backups again: =E2=80=9CC=
assandra
>> creates a hard link to each SSTable flushed or streamed locally in
>> a backups subdirectory of the keyspace data.=E2=80=9D
>>
>>
>>
>> Suppose we have a directory path: my_keyspace/my_table-some-uuid/backups=
/
>>
>> In the rest of the discussion, when I refer to =E2=80=9Ctable directory=
=E2=80=9D, I
>> explicitly mean the directory: my_keyspace/my_table-some-uuid/
>>
>> When I refer to backups/ directory, I explicitly mean:
>> my_keyspace/my_table-some-uuid/backups/
>>
>>
>>
>> Suppose that you have an sstable-A that was either flushed from a
>> memtable or streamed from another node.
>>
>> At this point, you have a hardlink to sstable-A in your table directory,
>> and a hardlink to sstable-A in your backups/ directory.
>>
>> Suppose that you have another sstable-B that was also either flushed fro=
m
>> a memtable or streamed from another node.
>>
>> At this point, you have a hardlink to sstable-B in your table directory,
>> and a hardlink to sstable-B in your backups/ directory.
>>
>>
>>
>> Next, suppose compaction were to occur, where say sstable-A and sstable-=
B
>> would be compacted to produce sstable-C, representing all the data from =
A
>> and B.
>>
>> Now, sstable-C will live in your main table directory, and the hardlinks
>> to sstable-A and sstable-B will be deleted in the main table directory, =
but
>> sstable-A and sstable-B will continue to exist in /backups.
>>
>> At this point, in your main table directory, you will have a hardlink to
>> sstable-C. In your backups/ directory you will have hardlinks to sstable=
-A,
>> and sstable-B.
>>
>>
>>
>> Thus, your main table directory is not cluttered with old un-compacted
>> sstables, and only has the sstables along with other files that are
>> actively being used.
>>
>>
>>
>> To drive the point home, =E2=80=A6
>>
>> Suppose that you have another sstable-D that was either flushed from a
>> memtable or streamed from another node.
>>
>> At this point, in your main table directory, you will have sstable-C and
>> sstable-D. In your backups/ directory you will have hardlinks to sstable=
-A,
>> sstable-B, and sstable-D.
>>
>>
>>
>> Next, suppose compaction were to occur where say sstable-C and sstable-D
>> would be compacted to produce sstable-E, representing all the data from =
C
>> and D.
>>
>> Now, sstable-E will live in your main table directory, and the hardlinks
>> to sstable-C and sstable-D will be deleted in the main table directory, =
but
>> sstable-D will continue to exist in /backups.
>>
>> At this point, in your main table directory, you will have a hardlink to
>> sstable-E. In your backups/ directory you will have hardlinks to sstable=
-A,
>> sstable-B and sstable-D.
>>
>>
>>
>> As you can see, the /backups directory quickly accumulates with all
>> un-compacted sstables and how it progressively used up more and more spa=
ce.
>>
>> Also, note that the /backups directory does not contain sstables
>> generated from compaction, such as sstable-C and sstable-E.
>>
>> It is safe to delete the entire backups/ directory because all the data
>> is represented in the compacted sstable-E.
>>
>> I hope this explanation was clear and gives you confidence in using rm t=
o
>> delete the directory for backups/.
>>
>>
>>
>> Best regards,
>>
>> -Razi
>>
>>
>>
>>
>>
>>
>>
>> *From: *Kunal Gangakhedkar <kgangakhedkar@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Wednesday, January 11, 2017 at 6:47 AM
>>
>>
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> Thanks for the reply, Razi.
>>
>>
>>
>> As I mentioned earlier, we're not currently using snapshots - it's only
>> the backups that are bothering me right now.
>>
>>
>>
>> So my next question is pertaining to this statement of yours:
>>
>>
>>
>> As far as I am aware, using *rm* is perfectly safe to delete the
>> directories for snapshots/backups as long as you are careful not to dele=
te
>> your actively used sstable files and directories.
>>
>>
>>
>> How do I find out which are the actively used sstables?
>>
>> If by that you mean the main data files, does that mean I can safely
>> remove all files ONLY under the "backups/" directory?
>>
>> Or, removing any files that are current hard-links inside backups can
>> potentially cause any issues?
>>
>>
>>
>> Thanks,
>>
>> Kunal
>>
>>
>>
>> On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
>> raziuddin.khaja@nih.gov> wrote:
>>
>> Hello Kunal,
>>
>>
>>
>> I would take a look at the following configuration options in the
>> Cassandra.yaml
>>
>>
>>
>> *Common automatic backup settings*
>>
>> *Incremental_backups:*
>>
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra
>> /configuration/configCassandra_yaml.html#configCassandra_
>> yaml__incremental_backups
>>
>>
>>
>> (Default: false) Backs up data updated since the last snapshot was taken=
.
>> When enabled, Cassandra creates a hard link to each SSTable flushed or
>> streamed locally in a backups subdirectory of the keyspace data. Removin=
g
>> these links is the operator's responsibility.
>>
>>
>>
>> *snapshot_before_compaction*:
>>
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra
>> /configuration/configCassandra_yaml.html#configCassandra_
>> yaml__snapshot_before_compaction
>>
>>
>>
>> (Default: false) Enables or disables taking a snapshot before each
>> compaction. A snapshot is useful to back up data when there is a data
>> format change. Be careful using this option: Cassandra does not clean up
>> older snapshots automatically.
>>
>>
>>
>>
>>
>> *Advanced automatic backup setting*
>>
>> *auto_snapshot*:
>>
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra
>> /configuration/configCassandra_yaml.html#configCassandra_
>> yaml__auto_snapshot
>>
>>
>>
>> (Default: true) Enables or disables whether Cassandra takes a snapshot o=
f
>> the data before truncating a keyspace or dropping a table. To prevent da=
ta
>> loss, Datastax strongly advises using the default setting. If you
>> set auto_snapshot to false, you lose data on truncation or drop.
>>
>>
>>
>>
>>
>> *nodetool* also provides methods to manage snapshots.
>> http://docs.datastax.com/en/archived/cassandra/3.x/cassandra
>> /tools/toolsNodetool.html
>>
>> See the specific commands:
>>
>>    - nodetool clearsnapshot
>>    <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/t=
oolsClearSnapShot.html>
>>    Removes one or more snapshots.
>>    - nodetool listsnapshots
>>    <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/t=
oolsListSnapShots.html>
>>    Lists snapshot names, size on disk, and true size.
>>    - nodetool snapshot
>>    <http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/t=
oolsSnapShot.html>
>>    Take a snapshot of one or more keyspaces, or of a table, to backup
>>    data.
>>
>>
>>
>> As far as I am aware, using *rm* is perfectly safe to delete the
>> directories for snapshots/backups as long as you are careful not to dele=
te
>> your actively used sstable files and directories.  I think the *nodetool
>> clearsnapshot* command is provided so that you don=E2=80=99t accidentall=
y delete
>> actively used files.  Last I used *clearsnapshot*, (a very long time
>> ago), I thought it left behind the directory, but this could have been
>> fixed in newer versions (so you might want to check that).
>>
>>
>>
>> HTH
>>
>> -Razi
>>
>>
>>
>>
>>
>> *From: *Jonathan Haddad <jon@jonhaddad.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Tuesday, January 10, 2017 at 12:26 PM
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> If you remove the files from the backup directory, you would not have
>> data loss in the case of a node going down.  They're hard links to the s=
ame
>> files that are in your data directory, and are created when an sstable i=
s
>> written to disk.  At the time, they take up (almost) no space, so they
>> aren't a big deal, but when the sstable gets compacted, they stick aroun=
d,
>> so they end up not freeing space up.
>>
>>
>>
>> Usually you use incremental backups as a means of moving the sstables of=
f
>> the node to a backup location.  If you're not doing anything with them,
>> they're just wasting space and you should disable incremental backups.
>>
>>
>>
>> Some people take snapshots then rely on incremental backups.  Others use
>> the tablesnap utility which does sort of the same thing.
>>
>>
>>
>> On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar <
>> kgangakhedkar@gmail.com> wrote:
>>
>> Thanks for quick reply, Jon.
>>
>>
>>
>> But, what about in case of node/cluster going down? Would there be data
>> loss if I remove these files manually?
>>
>>
>>
>> How is it typically managed in production setups?
>>
>> What are the best-practices for the same?
>>
>> Do people take snapshots on each node before removing the backups?
>>
>>
>>
>> This is my first production deployment - so, still trying to learn.
>>
>>
>>
>> Thanks,
>>
>> Kunal
>>
>>
>>
>> On 10 January 2017 at 21:36, Jonathan Haddad <jon@jonhaddad.com> wrote:
>>
>> You can just delete them off the filesystem (rm)
>>
>>
>>
>> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
>> kgangakhedkar@gmail.com> wrote:
>>
>> Hi all,
>>
>>
>>
>> We have a 3-node cassandra cluster with incremental backup set to true.
>>
>> Each node has 1TB data volume that stores cassandra data.
>>
>>
>>
>> The load in the output of 'nodetool status' comes up at around 260GB eac=
h
>> node.
>>
>> All our keyspaces use replication factor =3D 3.
>>
>>
>>
>> However, the df output shows the data volumes consuming around 850GB of
>> space.
>>
>> I checked the keyspace directory structures - most of the space goes in
>> <CASS_DATA_VOL>/data/<KEYSPACE>/<CF>/backups.
>>
>>
>>
>> We have never manually run snapshots.
>>
>>
>>
>> What is the typical procedure to clear the backups?
>>
>> Can it be done without taking the node offline?
>>
>>
>>
>> Thanks,
>>
>> Kunal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

--94eb2c03750ab2aa4905462045c1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">You don&#39;t have a viable solution because you are not mak=
ing a snapshot as a starting point. After a while you will have a lot of ba=
ckup data.=C2=A0 Using the backups to get your cluster to a given state wil=
l involve copying a very large amount of backup data, possibility more than=
 the capacity of your cluster followed by a tremendous amount of compaction=
. If your topology changes life could really get miserable. I would counsel=
 having period snapshots so that your possible bad day in the future is les=
s bad.</p>
<div class=3D"gmail_quote">On Jan 13, 2017 8:01 AM, &quot;Kunal Gangakhedka=
r&quot; &lt;<a href=3D"mailto:kgangakhedkar@gmail.com">kgangakhedkar@gmail.=
com</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote=
" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><=
div dir=3D"ltr">Great, thanks a lot to all for the help :)<div><br></div><d=
iv>I finally took the dive and went with Razi&#39;s suggestions.</div><div>=
In summary, this is what I did:</div><div><ul><li>turn off incremental back=
ups on each of the nodes in rolling fashion<br></li><li>remove the &#39;bac=
kups&#39; directory from each keyspace on each node.<br></li></ul><div>This=
 ended up freeing up almost 350GB on each node - yay :)</div></div><div><br=
></div><div>Again, thanks a lot for the help, guys.</div><div class=3D"gmai=
l_extra"><br clear=3D"all"><div><div class=3D"m_-2780106553560060781gmail_s=
ignature" data-smartmail=3D"gmail_signature">Kunal</div></div>
<br><div class=3D"gmail_quote">On 12 January 2017 at 21:15, Khaja, Raziuddi=
n (NIH/NLM/NCBI) [C] <span dir=3D"ltr">&lt;<a href=3D"mailto:raziuddin.khaj=
a@nih.gov" target=3D"_blank">raziuddin.khaja@nih.gov</a>&gt;</span> wrote:<=
br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex">


<div bgcolor=3D"white" lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"m_-2780106553560060781m_-7198902211404113412WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>snapshots are slightly different than backups.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>In my explanation of the hardlinks created in the backups folder, notice t=
hat compacted sstables, never end up in the backups folder.<u></u><u></u></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>On the other hand, a snapshot is meant to represent the data at a particul=
ar moment in time. Thus, the snapshots directory contains hardlinks to all =
active sstables at the time the snapshot
 was taken, which would include: compacted sstables; and any sstables from =
memtable flush or streamed from other nodes that both exist in the table di=
rectory and the backups directory.<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>So, that would be the difference between snapshots and backups.<u></u><u><=
/u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Best regards,<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>-Razi<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><u></u>=C2=A0<u></u></span></p>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span style=3D"font-family:Calibri;color:black">F=
rom: </span>
</b><span style=3D"font-family:Calibri;color:black">Alain RODRIGUEZ &lt;<a =
href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arodrime@gmail.com</a>=
&gt;<br>
<b>Reply-To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=
=3D"_blank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@=
cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<b=
r>
<b>Date: </b>Thursday, January 12, 2017 at 9:16 AM</span></p><div><div clas=
s=3D"m_-2780106553560060781h5"><br>
<b>To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_bl=
ank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@cassand=
ra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<br>
<b>Subject: </b>Re: Backups eating up disk space<u></u><u></u></div></div><=
p></p>
</div><div><div class=3D"m_-2780106553560060781h5">
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">My 2 cents, <u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class=3D"MsoNormal"><span style=3D"font-size:9.5pt">As I mentioned earli=
er, we&#39;re not currently using snapshots - it&#39;s only the backups tha=
t are bothering me right now.</span><u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I believe backups folder is just the new name for th=
e previously called snapshots folder. But I can be completely wrong, I have=
n&#39;t played that much with snapshots in new versions yet.<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Anyway, some operations in Apache Cassandra can trig=
ger a snapshot:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">- Repair (when not using parallel option but sequent=
ial repairs instead)<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">- Truncating a table (by default)<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">- Dropping a table (by default)<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">- Maybe other I can&#39;t think of... ?<u></u><u></u=
></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">If you want to clean space but still keep a backup y=
ou can run:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&quot;nodetool clearsnapshots&quot;<u></u><u></u></p=
>
</div>
<div>
<p class=3D"MsoNormal">&quot;nodetool snapshot &lt;whatever&gt;&quot;<u></u=
><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">This way and for a while, data won&#39;t be taking s=
pace as old files will be cleaned and new files will be only hardlinks as d=
etailed above. Then you might want to work at a proper backup policy, proba=
bly implying getting data out of production
 server (a lot of people uses S3 or similar services). Or just do that from=
 time to time, meaning you only keep a backup and disk space behaviour will=
 be hard to predict.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">C*heers,<u></u><u></u></p>
</div>
<div>
<div>
<p class=3D"MsoNormal">-----------------------<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Alain Rodriguez - @arodream - <a href=3D"mailto:alai=
n@thelastpickle.com" target=3D"_blank">
alain@thelastpickle.com</a><u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">France<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">The Last Pickle - Apache Cassandra Consulting<u></u>=
<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"http://www.thelastpickle.com" target=3D"_=
blank">http://www.thelastpickle.com</a><u></u><u></u></p>
</div>
</div>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal">2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar &lt;<a hr=
ef=3D"mailto:prasenjit.sarkar@datos.io" target=3D"_blank">prasenjit.sarkar@=
datos.io</a>&gt;:<u></u><u></u></p>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class=3D"MsoNormal">Hi Kunal, <u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Razi&#39;s post does give a very lucid description o=
f how cassandra manages the hard links inside the backup directory.<u></u><=
u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Where it needs clarification is the following:<u></u=
><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">--&gt; incremental backups is a system wide setting =
and so its an all or nothing approach<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">--&gt; as multiple people have stated, incremental b=
ackups do not create hard links to compacted sstables. however, this can bl=
oat the size of your backups<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">--&gt; again as stated, it is a general industry pra=
ctice to place backups in a different secondary storage location than the m=
ain production site. So best to move it to the secondary storage before app=
lying rm on the backups folder<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">In my experience with production clusters, managing =
the backups folder across multiple nodes can be painful if the objective is=
 to ever recover data. With the usual disclaimers, better to rely on third =
party vendors to accomplish the needful
 rather than scripts/tablesnap.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Regards<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><span class=3D"m_-2780106553560060781m_-719890221140=
4113412hoenzb"><span style=3D"color:#888888">Prasenjit</span></span>
<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal">On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (N=
IH/NLM/NCBI) [C] &lt;<a href=3D"mailto:raziuddin.khaja@nih.gov" target=3D"_=
blank">raziuddin.khaja@nih.gov</a>&gt; wrote:<u></u><u></u></p>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Hello Kunal,</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Caveat: I am not a super-expert on Cassandra, but it helps to explain to o=
thers, in order to eventually become an expert, so if
 my explanation is wrong, I would hope others would correct me. </span><spa=
n style=3D"font-size:11.0pt;font-family:Wingdings">J</span><u></u><u></u></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>The active sstables/data files are are all the files located in the direct=
ory for the table.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>You can safely remove all files under the backups/ directory and the direc=
tory itself.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Removing any files that are current hard-links inside backups won=E2=80=99=
t cause any issues, and I will explain why.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Have you looked at your Cassandra.yaml file and checked the setting for in=
cremental_backups?=C2=A0 If it is set to true, and you don=E2=80=99t
 want to make new backups, you can set it to false, so that after you clean=
 up, you will not have to clean up the backups again.</span><u></u><u></u><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Explanation:</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Lets look at the the definition of incremental backups again: =E2=80=9CCas=
sandra creates a hard link to each SSTable flushed or streamed
 locally in a=C2=A0backups=C2=A0subdirectory of the keyspace data.=E2=80=9D=
</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Suppose we have a directory path: my_keyspace/my_table-some-uuid<wbr>/back=
ups/</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>In the rest of the discussion, when I refer to =E2=80=9Ctable directory=E2=
=80=9D, I explicitly mean the directory: my_keyspace/my_table-some-uuid<wbr=
>/</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>When I refer to backups/ directory, I explicitly mean: my_keyspace/my_tabl=
e-some-uuid<wbr>/backups/</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Suppose that you have an sstable-A that was either flushed from a memtable=
 or streamed from another node.
</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>At this point, you have a hardlink to sstable-A in your table directory, a=
nd a hardlink to sstable-A in your backups/ directory.</span><u></u><u></u>=
</p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Suppose that you have another sstable-B that was also either flushed from =
a memtable or streamed from another node.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>At this point, you have a hardlink to sstable-B in your table directory, a=
nd a hardlink to sstable-B in your backups/ directory.</span><u></u><u></u>=
</p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Next, suppose compaction were to occur, where say sstable-A and sstable-B =
would be compacted to produce sstable-C, representing
 all the data from A and B. </span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Now, sstable-C will live in your main table directory, and the hardlinks t=
o sstable-A and sstable-B will be deleted in the main
 table directory, but sstable-A and sstable-B will continue to exist in /ba=
ckups.
</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>At this point, in your main table directory, you will have a hardlink to s=
stable-C. In your backups/ directory you will have hardlinks
 to sstable-A, and sstable-B.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Thus, your main table directory is not cluttered with old un-compacted sst=
ables, and only has the sstables along with other files
 that are actively being used.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>To drive the point home, =E2=80=A6</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Suppose that you have another sstable-D that was either flushed from a mem=
table or streamed from another node.
</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>At this point, in your main table directory, you will have sstable-C and s=
stable-D. In your backups/ directory you will have hardlinks
 to sstable-A, sstable-B, and sstable-D.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Next, suppose compaction were to occur where say sstable-C and sstable-D w=
ould be compacted to produce sstable-E, representing
 all the data from C and D. </span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Now, sstable-E will live in your main table directory, and the hardlinks t=
o sstable-C and sstable-D will be deleted in the main
 table directory, but sstable-D will continue to exist in /backups. </span>=
<u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>At this point, in your main table directory, you will have a hardlink to s=
stable-E. In your backups/ directory you will have hardlinks
 to sstable-A, sstable-B and sstable-D.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>As you can see, the /backups directory quickly accumulates with all un-com=
pacted sstables and how it progressively used up more
 and more space.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Also, note that the /backups directory does not contain sstables generated=
 from compaction, such as sstable-C and sstable-E.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>It is safe to delete the entire backups/ directory because all the data is=
 represented in the compacted sstable-E.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>I hope this explanation was clear and gives you confidence in using rm to =
delete the directory for backups/.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Best regards,</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>-Razi</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span style=3D"font-family:Calibri;color:black">F=
rom:
</span></b><span style=3D"font-family:Calibri;color:black">Kunal Gangakhedk=
ar &lt;<a href=3D"mailto:kgangakhedkar@gmail.com" target=3D"_blank">kgangak=
hedkar@gmail.com</a>&gt;<br>
<b>Reply-To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=
=3D"_blank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@=
cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<b=
r>
<b>Date: </b>Wednesday, January 11, 2017 at 6:47 AM</span><u></u><u></u></p=
>
<div>
<div>
<p class=3D"MsoNormal"><br>
<b>To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_bl=
ank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@cassand=
ra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<br>
<b>Subject: </b>Re: Backups eating up disk space<u></u><u></u></p>
</div>
</div>
</div>
<div>
<div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Thanks for the reply, Razi.
<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">As I mentioned earlier, we&#39;re not currently usin=
g snapshots - it&#39;s only the backups that are bothering me right now.<u>=
</u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">So my next question is pertaining to this statement =
of yours:<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-=
bottom:5.0pt">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>As far as I am aware, using=C2=A0<b>rm</b>=C2=A0is perfectly safe to delet=
e the directories for snapshots/backups as long as you are careful
 not to delete your actively used sstable files and directories.=C2=A0</spa=
n><u></u><u></u></p>
</blockquote>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">How do I find out which are the actively used sstabl=
es?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">If by that you mean the main data files, does that m=
ean I can safely remove all files ONLY under the &quot;backups/&quot; direc=
tory?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Or, removing any files that are current hard-links i=
nside backups can potentially cause any issues?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Thanks,<br clear=3D"all">
<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal">Kunal<u></u><u></u></p>
</div>
</div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">On 11 January 2017 at 01:06, Khaja, Raziuddin (NIH/N=
LM/NCBI) [C] &lt;<a href=3D"mailto:raziuddin.khaja@nih.gov" target=3D"_blan=
k">raziuddin.khaja@nih.gov</a>&gt; wrote:<u></u><u></u></p>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-=
bottom:5.0pt">
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>Hello Kunal,</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>I would take a look at the following configuration options in the Cassandr=
a.yaml</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><i><span style=3D"font-size:11.0pt;font-family:Calib=
ri">Common automatic backup settings</span></i><u></u><u></u></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:11.0pt;font-family:Calib=
ri">Incremental_backups:</span></b><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><a href=3D"http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/co=
nfiguration/configCassandra_yaml.html#configCassandra_yaml__incremental_bac=
kups" target=3D"_blank">http://docs.datastax.com/en/ar<wbr>chived/cassandra=
/3.x/cassandra<wbr>/configuration/configCassandra<wbr>_yaml.html#configCass=
andra_<wbr>yaml__incremental_backups</a></span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal" style=3D"margin-left:.5in">
<span style=3D"font-size:11.0pt;font-family:Calibri">(Default:=C2=A0false) =
Backs up data updated since the last snapshot was taken. When enabled, Cass=
andra creates a hard link to each SSTable flushed or streamed locally in a=
=C2=A0backups=C2=A0subdirectory of the keyspace data.
 Removing these links is the operator&#39;s responsibility.</span><u></u><u=
></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:11.0pt;font-family:Calib=
ri">snapshot_before_compaction</span></b><span style=3D"font-size:11.0pt;fo=
nt-family:Calibri">:</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><a href=3D"http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/co=
nfiguration/configCassandra_yaml.html#configCassandra_yaml__snapshot_before=
_compaction" target=3D"_blank">http://docs.datastax.com/en/ar<wbr>chived/ca=
ssandra/3.x/cassandra<wbr>/configuration/configCassandra<wbr>_yaml.html#con=
figCassandra_<wbr>yaml__snapshot_before_<wbr>compaction</a></span><u></u><u=
></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal" style=3D"margin-left:.5in">
<span style=3D"font-size:11.0pt;font-family:Calibri">(Default:=C2=A0false) =
Enables or disables taking a snapshot before each compaction. A snapshot is=
 useful to back up data when there is a data format change. Be careful usin=
g this option: Cassandra does not clean
 up older snapshots automatically.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><i><span style=3D"font-size:11.0pt;font-family:Calib=
ri">Advanced automatic backup setting</span></i><u></u><u></u></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:11.0pt;font-family:Calib=
ri">auto_snapshot</span></b><span style=3D"font-size:11.0pt;font-family:Cal=
ibri">:</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
><a href=3D"http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/co=
nfiguration/configCassandra_yaml.html#configCassandra_yaml__auto_snapshot" =
target=3D"_blank">http://docs.datastax.com/en/ar<wbr>chived/cassandra/3.x/c=
assandra<wbr>/configuration/configCassandra<wbr>_yaml.html#configCassandra_=
<wbr>yaml__auto_snapshot</a></span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal" style=3D"margin-left:.5in">
<span style=3D"font-size:11.0pt;font-family:Calibri">(Default:=C2=A0true) E=
nables or disables whether Cassandra takes a snapshot of the data before tr=
uncating a keyspace or dropping a table. To prevent data loss, Datastax str=
ongly advises using the default setting.
 If you set=C2=A0auto_snapshot=C2=A0to=C2=A0false, you lose data on truncat=
ion or drop.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:11.0pt;font-family:Calib=
ri">nodetool</span></b><span style=3D"font-size:11.0pt;font-family:Calibri"=
> also provides methods to manage snapshots.
<a href=3D"http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/too=
ls/toolsNodetool.html" target=3D"_blank">
http://docs.datastax.com/en/ar<wbr>chived/cassandra/3.x/cassandra<wbr>/tool=
s/toolsNodetool.html</a></span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>See the specific commands:</span><u></u><u></u></p>
<ul type=3D"disc">
<li class=3D"MsoNormal" style=3D"color:#374c51;margin-bottom:.1in;backgroun=
d:#f9f8f7;background-image:initial;background-position:initial;background-s=
ize:initial;background-repeat:initial;background-origin:initial;background-=
clip:initial">
<span style=3D"font-family:Helvetica"><a href=3D"http://docs.datastax.com/e=
n/archived/cassandra/3.x/cassandra/tools/toolsClearSnapShot.html" target=3D=
"_blank"><span style=3D"color:#007a97;text-decoration:none">nodetool clears=
napshot</span></a><br>
Removes one or more snapshots.</span><u></u><u></u></li><li class=3D"MsoNor=
mal" style=3D"color:#374c51;margin-bottom:.1in;background:#f9f8f7;backgroun=
d-image:initial;background-position:initial;background-size:initial;backgro=
und-repeat:initial;background-origin:initial;background-clip:initial">
<span style=3D"font-family:Helvetica"><a href=3D"http://docs.datastax.com/e=
n/archived/cassandra/3.x/cassandra/tools/toolsListSnapShots.html" target=3D=
"_blank"><span style=3D"color:#007a97;text-decoration:none">nodetool listsn=
apshots</span></a><br>
Lists snapshot names, size on disk, and true size.</span><u></u><u></u></li=
><li class=3D"MsoNormal" style=3D"color:#374c51;margin-bottom:.1in;backgrou=
nd:#f9f8f7;background-image:initial;background-position:initial;background-=
size:initial;background-repeat:initial;background-origin:initial;background=
-clip:initial">
<span style=3D"font-family:Helvetica"><a href=3D"http://docs.datastax.com/e=
n/archived/cassandra/3.x/cassandra/tools/toolsSnapShot.html" target=3D"_bla=
nk"><span style=3D"color:#007a97;text-decoration:none">nodetool snapshot</s=
pan></a><br>
Take a snapshot of one or more keyspaces, or of a table, to backup data.</s=
pan><u></u><u></u></li></ul>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>As far as I am aware, using
<b>rm</b> is perfectly safe to delete the directories for snapshots/backups=
 as long as you are careful not to delete your actively used sstable files =
and directories.=C2=A0 I think the
<b>nodetool clearsnapshot</b> command is provided so that you don=E2=80=99t=
 accidentally delete actively used files.=C2=A0 Last I used
<b>clearsnapshot</b>, (a very long time ago), I thought it left behind the =
directory, but this could have been fixed in newer versions (so you might w=
ant to check that).</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>HTH</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>-Razi</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:Calibri"=
>=C2=A0</span><u></u><u></u></p>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in =
0in 0in">
<p class=3D"MsoNormal"><b><span style=3D"font-family:Calibri;color:black">F=
rom:
</span></b><span style=3D"font-family:Calibri;color:black">Jonathan Haddad =
&lt;<a href=3D"mailto:jon@jonhaddad.com" target=3D"_blank">jon@jonhaddad.co=
m</a>&gt;<br>
<b>Reply-To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=
=3D"_blank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@=
cassandra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<b=
r>
<b>Date: </b>Tuesday, January 10, 2017 at 12:26 PM<br>
<b>To: </b>&quot;<a href=3D"mailto:user@cassandra.apache.org" target=3D"_bl=
ank">user@cassandra.apache.org</a>&quot; &lt;<a href=3D"mailto:user@cassand=
ra.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&gt;<br>
<b>Subject: </b>Re: Backups eating up disk space</span><u></u><u></u></p>
</div>
<div>
<div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">If you remove the files from the backup directory, y=
ou would not have data loss in the case of a node going down.=C2=A0 They=
9;re hard links to the same files that are in your data directory,
 and are created when an sstable is written to disk.=C2=A0 At the time, the=
y take up (almost) no space, so they aren&#39;t a big deal, but when the ss=
table gets compacted, they stick around, so they end up not freeing space u=
p. =C2=A0
<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Usually you use incremental backups as a means of mo=
ving the sstables off the node to a backup location.=C2=A0 If you&#39;re no=
t doing anything with them, they&#39;re just wasting space and
 you should disable incremental backups.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Some people take snapshots then rely on incremental =
backups.=C2=A0 Others use the tablesnap utility which does sort of the same=
 thing.<u></u><u></u></p>
</div>
</div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal">On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar &=
lt;<a href=3D"mailto:kgangakhedkar@gmail.com" target=3D"_blank">kgangakhedk=
ar@gmail.com</a>&gt; wrote:<u></u><u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-=
bottom:5.0pt">
<div>
<p class=3D"MsoNormal">Thanks for quick reply, Jon.
<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">But, what about in case of node/cluster going down? =
Would there be data loss if I remove these files manually?<u></u><u></u></p=
>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">How is it typically managed in production setups?<u>=
</u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">What are the best-practices for the same?<u></u><u><=
/u></p>
</div>
<div>
<p class=3D"MsoNormal">Do people take snapshots on each node before removin=
g the backups?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">This is my first production deployment - so, still t=
rying to learn.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Thanks,<br clear=3D"all">
<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal">Kunal<u></u><u></u></p>
</div>
</div>
</div>
</div>
<div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">On 10 January 2017 at 21:36, Jonathan Haddad
<span class=3D"m_-2780106553560060781m_-7198902211404113412m767321278029527=
3437m5009122497448487584gmail-m-6701081734930593409gmailmsg">
&lt;<a href=3D"mailto:jon@jonhaddad.com" target=3D"_blank">jon@jonhaddad.co=
m</a>&gt;</span> wrote:<u></u><u></u></p>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-=
bottom:5.0pt">
<div>
<p class=3D"MsoNormal">You can just delete them off the filesystem (rm)<u><=
/u><u></u></p>
</div>
<div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal">On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar &=
lt;<a href=3D"mailto:kgangakhedkar@gmail.com" target=3D"_blank">kgangakhedk=
ar@gmail.com</a>&gt; wrote:<u></u><u></u></p>
</div>
<blockquote style=3D"border:none;border-left:solid #cccccc 1.0pt;padding:0i=
n 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-=
bottom:5.0pt">
<div>
<p class=3D"MsoNormal">Hi all,
<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We have a 3-node cassandra cluster with incremental =
backup set to true.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Each node has 1TB data volume that stores cassandra =
data.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">The load in the output of &#39;nodetool status&#39; =
comes up at around 260GB each node.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">All our keyspaces use replication factor =3D 3.<u></=
u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">However, the df output shows the data volumes consum=
ing around 850GB of space.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I checked the keyspace directory structures - most o=
f the space goes in &lt;CASS_DATA_VOL&gt;/data/&lt;KEYSPACE<wbr>&gt;/&lt;CF=
&gt;/backups.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We have never manually run snapshots.<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">What is the typical procedure to clear the backups?<=
u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Can it be done without taking the node offline?<u></=
u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Thanks,<br clear=3D"all">
<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal">Kunal<u></u><u></u></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div></div></div>
</div>

</blockquote></div><br></div></div>
</blockquote></div>

--94eb2c03750ab2aa4905462045c1--