Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: <CAJpqPhh_Mi+Tav7QRBC4h+dT-t9VHLM4LDMdeqyuObVMugsSJg@mail.gmail.com>
 <CAEPhM6Ux883JkTLRfXJJ64jZNS-1_bassVbfmCcKDV-xeLjA4g@mail.gmail.com>
 <CABNXB2AsLOwK0WX9nOw2cpccXZ6fBDyhDqXZ8H_PYjSqzv9gxw@mail.gmail.com>
 <CAJpqPhgdY5SpTY4=o2ctgmqTtObFZgm6TfjQWVsKM3VBReCLng@mail.gmail.com>
 <CAPccznSBSW8e7Pnsq_mcoz4gMESyg7PhysV7JGCmUfRpAsp1HA@mail.gmail.com>
 <CAJpqPhhoFnqL0hZF24CKCLqkARtVWvHCep4a9RrVe8vGZPMrgA@mail.gmail.com>
 <CAJpqPhhqw=tExMXw92nm=UiTQvKFYNVxA_ni6LGeKt9nodfSoA@mail.gmail.com>
 <5572FE7D-45B2-4148-A359-A0613B340E0B@crowdstrike.com> <CAJpqPhjmqnPWVeeXUp3WOiP0Y7s0R6sYZtyQKcJ6wD7qmkN55w@mail.gmail.com>
 <CA+VSrLqYE07B66z4rRAHEp1tWZzfiox3Dp3cLO04U5YDm4EqSw@mail.gmail.com>
 <CAJpqPhjjNG4+h4dE40S6pte1juL5KFDD05eif4VhG5H_HnUxeg@mail.gmail.com>
 <CAORswtzC9Wfv-OMQnVH1vT=vnxKu2D=P2ey233vwrXxajdXNxg@mail.gmail.com>
 <CA+VSrLqs52s6eaCaCV1Q2W7cs=sQ_TGRF4Lso_CVHKQPOe0_Eg@mail.gmail.com>
 <CAORswtzKnLG-9xg5BB4Lw4-HXBXniqOzaVepRDBs9hov1fo3_Q@mail.gmail.com> <CABNXB2ChHfdRmBmkTvk9+OpORW8ymEgUdNWha5Gtj4svRXkKog@mail.gmail.com>
In-Reply-To: <CABNXB2ChHfdRmBmkTvk9+OpORW8ymEgUdNWha5Gtj4svRXkKog@mail.gmail.com>
From: Eric Stevens <mightye@gmail.com>
Date: Sat, 30 Jul 2016 00:03:58 +0000
Message-ID: <CAORswtxUL_h9hq479H=_vNEBoNw_akFPW25VEdyEUg2-NwrWbA@mail.gmail.com>
Subject: Re: Re : Purging tombstones from a particular row in SSTable
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a113d35d61c4b900538cf1b76
archived-at: Sat, 30 Jul 2016 00:04:18 -0000

--001a113d35d61c4b900538cf1b76
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I haven't tested that specifically, but I haven't bumped into any
particular optimization that allows it to skip reading an sstable where the
entire relevant partition has been row-tombstoned.  It's possible that
something like that could happen by examining min/max timestamps on
sstables, and not reading from any sstable with a partition-level tombstone
where the max timestamp is less than the timestamp of the partition
tombstone.  However that presumes that it can have read the tombstones from
each sstable before it read the occluded data, which I don't think is
likely.

Such an optimization could be there, but I haven't noticed it if it is,
though I'm certainly not an expert (more of a well informed novice).  If
someone wants to set me straight on this point I'd love to know about it.

On Fri, Jul 29, 2016 at 2:37 PM DuyHai Doan <doanduyhai@gmail.com> wrote:

> @Eric
>
> Very interesting example. But then what is the case of row (should I say
> partition ?) tombstones ?
>
> Suppose that in your example, I issued a DELETE FROM foo WHERE pk=3D'a'
>
> With the same SELECT statement than before, would C* be clever enough to
> skip reading at all the whole partition (let's limit the example to a
> single SSTable) ?
>
> On Fri, Jul 29, 2016 at 7:00 PM, Eric Stevens <mightye@gmail.com> wrote:
>
>> > Sai was describing a timeout, not a failure due to the 100 K tombstone
>> limit from cassandra.yaml. But I still might be missing things about
>> tombstones.
>>
>> The trouble with tombstones is not the tombstones themselves, rather it'=
s
>> that Cassandra has a lot of deleted data to read through in sstables in
>> order to satisfy a query.  Although if you range constrain your cluster =
key
>> in your query, the read path can optimize that read to start somewhere n=
ear
>> the correct head of your selected data, that is _not_ true for tombstone=
d
>> data.
>>
>> Consider this exercise:
>> CREATE TABLE foo (
>>   pk text,
>>   ck int,
>>   PRIMARY KEY ((pk), ck)
>> )
>> INSERT INTO foo (pk,ck) VALUES ('a', 1)
>> ...
>> INSERT INTO foo (pk,ck) VALUES ('a', 100000)
>>
>> $ nodetool flush
>>
>> DELETE FROM foo WHERE pk=3D'a' AND ck < 99999
>>
>> We've now written a single "tiny" (bytes-wise) range tombstone.
>>
>> Now try to select from that table:
>> SELECT * FROM foo WHERE ck > 50000 LIMIT 1
>> pk | ck
>> -- | ------
>> a  | 100000
>>
>> This has to read from the first sstable, skipping over 49999 records
>> before it can locate the first non-tombstoned cell.
>>
>> The problem isn't the size of the tombstone, tombstones themselves are
>> cheaper (again, bytes-wise) than standard columns because they don't
>> involve any value for the cell.  The problem is that the read path canno=
t
>> anticipate in advance what cells are going to be occluded by the tombsto=
ne,
>> and in order to satisfy the query it needs to read then discard a large
>> number of deleted cells.
>>
>> The reason the thresholds exist in cassandra.yaml is to help guide users
>> away from performance anti-patterns that come from selects which include=
 a
>> large number of tombstoned cells.
>>
>> On Thu, Jul 28, 2016 at 11:08 PM Alain RODRIGUEZ <arodrime@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> @Eric
>>>
>>> Large range tombstones can occupy just a few bytes but can occlude
>>>> millions of records, and have the corresponding performance impact on
>>>> reads.  It's really not the size of the tombstone on disk that matters=
, but
>>>> the number of records it occludes.
>>>
>>>
>>> Sai was describing a timeout, not a failure due to the 100 K tombstone
>>> limit from cassandra.yaml. But I still might be missing things about
>>> tombstones.
>>>
>>> The read queries are continuously failing though because of the
>>>> tombstones. "Request did not complete within rpc_timeout."
>>>>
>>>
>>> So that is what looks weird to me. Reading 220 kb, even holding
>>> tombstone should probably not take that long... Or am I wrong or missin=
g
>>> something?
>>>
>>> Your talk looks like cool stuff :-).
>>>
>>> @Sai
>>>
>>> The issues here was that tombstones were not in the SSTable, but rather
>>>> in the Memtable
>>>
>>>
>>> This sounds weird to me as well, knowing that memory is faster than dis=
k
>>> and that memtables are mutable data (so less stuff to read from there).
>>> Flushing might have triggered some compaction, removing tombstones thou=
gh.
>>>
>>> This still sounds very weird to me but I am glad you solved your issue
>>> (temporary at least).
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - alain@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2016-07-29 3:25 GMT+02:00 Eric Stevens <mightye@gmail.com>:
>>>
>>>> Tombstones will not get removed even after gc_grace if bloom filters
>>>> indicate that there is overlapping data with the tombstone's partition=
 in a
>>>> different sstable.  This is because compaction can't be certain that t=
he
>>>> tombstone doesn't overlap data in that other table.  If you're writing=
 to
>>>> one end of a partition key while deleting off the other end (for examp=
le
>>>> you've created engaged in the queue anti-pattern), your tombstones wil=
l
>>>> essentially never go away.
>>>>
>>>>
>>>>> 220kb worth of tombstones doesn=E2=80=99t seem like enough to worry a=
bout.
>>>>
>>>>
>>>> Large range tombstones can occupy just a few bytes but can occlude
>>>> millions of records, and have the corresponding performance impact on
>>>> reads.  It's really not the size of the tombstone on disk that matters=
, but
>>>> the number of records it occludes.
>>>>
>>>> You must either do a full compaction (while also not writing to the
>>>> partitions being considered, and after you've forced a cluster-wide fl=
ush,
>>>> and after the tombstones are gc_grace old, and assuming size tiered an=
d not
>>>> leveled compaction) to get rid of those tombstones, or probably easier=
 is
>>>> to do something similar to sstable2json, remove the tombstones by hand=
,
>>>> then json2sstable and replace the offending sstable.  Note that you re=
ally
>>>> have to be certain what you're doing here or you'll end up resurrectin=
g
>>>> deleted records.
>>>>
>>>> If these all sound like bad options, it's because they are, and you
>>>> don't have a lot of options without changing your schema to eventually=
 stop
>>>> writing to (and especially reading from) partitions which you also do
>>>> deletes on.  https://issues.apache.org/jira/browse/CASSANDRA-7019 prop=
oses
>>>> to offer a better alternative, but it's still in progress.
>>>>
>>>> Shameless plug, I'm talking about my company's alternative to
>>>> tombstones and TTLs at this year's Cassandra Summit:
>>>> http://myeventagenda.com/sessions/1CBFC920-807D-41C1-942C-8D1A7C10F4FA=
/5/5#sessionID=3D165
>>>>
>>>>
>>>> On Thu, Jul 28, 2016 at 11:07 AM sai krishnam raju potturi <
>>>> pskraju88@gmail.com> wrote:
>>>>
>>>>> thanks a lot Alain. That was really great info.
>>>>>
>>>>> The issues here was that tombstones were not in the SSTable, but
>>>>> rather in the Memtable. We had to a nodetool flush, and run a nodetoo=
l
>>>>> compact to get rid of the tombstones, a million of them. The size of =
the
>>>>> largest SSTable was actually 48MB.
>>>>>
>>>>> This link was helpful in getting the count of tombstones in a sstable=
,
>>>>> which was 0 in our case.
>>>>> https://gist.github.com/JensRantil/063b7c56ca4a8dfe1c50
>>>>>
>>>>>     The application team did not have a good model. They are working
>>>>> on a new datamodel.
>>>>>
>>>>> thanks
>>>>>
>>>>> On Wed, Jul 27, 2016 at 7:17 PM, Alain RODRIGUEZ <arodrime@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I just released a detailed post about tombstones today that might be
>>>>>> of some interest for you:
>>>>>> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstone=
s.html
>>>>>>
>>>>>> 220kb worth of tombstones doesn=E2=80=99t seem like enough to worry =
about.
>>>>>>
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> I believe you might be missing some other bigger SSTable having a lo=
t
>>>>>> of tombstones as well. Finding the biggest sstable and reading the
>>>>>> tombstone ratio from there might be more relevant.
>>>>>>
>>>>>> You also should give a try to: "unchecked_tombstone_compaction" set
>>>>>> to true rather than tuning other options so aggressively. The "singl=
e
>>>>>> SSTable compaction" section of my post might help you on this issue:
>>>>>> http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstone=
s.html#single-sstable-compaction
>>>>>>
>>>>>> Other thoughts:
>>>>>>
>>>>>> Also if you use TTLs and timeseries, using TWCS instead of STCS coul=
d
>>>>>> be more efficient evicting tombstones.
>>>>>>
>>>>>> we have a columnfamily that has around 1000 rows, with one row is
>>>>>>> really huge (million columns)
>>>>>>
>>>>>>
>>>>>> I am sorry to say that this model does not look that great.
>>>>>> Imbalances might become an issue as a few nodes will handle a lot mo=
re load
>>>>>> than the rest of the nodes. Also even if this is getting improved in=
 newer
>>>>>> versions of Cassandra, wide rows are something you want to avoid whi=
le
>>>>>> using 2.0.14 (which is no longer supported for about a year now). I =
know it
>>>>>> is not always easy and never the good time, but maybe should you con=
sider
>>>>>> upgrading both your model and your version of Cassandra (regardless =
of the
>>>>>> fact you manage to solve this issue or not with
>>>>>> "unchecked_tombstone_compaction").
>>>>>>
>>>>>> Good luck,
>>>>>>
>>>>>> C*heers,
>>>>>> -----------------------
>>>>>> Alain Rodriguez - alain@thelastpickle.com
>>>>>> France
>>>>>>
>>>>>> The Last Pickle - Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> 2016-07-28 0:00 GMT+02:00 sai krishnam raju potturi <
>>>>>> pskraju88@gmail.com>:
>>>>>>
>>>>>>> The read queries are continuously failing though because of the
>>>>>>> tombstones. "Request did not complete within rpc_timeout."
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 27, 2016 at 5:51 PM, Jeff Jirsa <
>>>>>>> jeff.jirsa@crowdstrike.com> wrote:
>>>>>>>
>>>>>>>> 220kb worth of tombstones doesn=E2=80=99t seem like enough to worr=
y about.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From: *sai krishnam raju potturi <pskraju88@gmail.com>
>>>>>>>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org=
>
>>>>>>>> *Date: *Wednesday, July 27, 2016 at 2:43 PM
>>>>>>>> *To: *Cassandra Users <user@cassandra.apache.org>
>>>>>>>> *Subject: *Re: Re : Purging tombstones from a particular row in
>>>>>>>> SSTable
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> and also the sstable size in question is like 220 kb in size.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 27, 2016 at 5:41 PM, sai krishnam raju potturi <
>>>>>>>> pskraju88@gmail.com> wrote:
>>>>>>>>
>>>>>>>> it's set to 1800 Vinay.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  bloom_filter_fp_chance=3D0.010000 AND
>>>>>>>>
>>>>>>>>   caching=3D'KEYS_ONLY' AND
>>>>>>>>
>>>>>>>>   comment=3D'' AND
>>>>>>>>
>>>>>>>>   dclocal_read_repair_chance=3D0.100000 AND
>>>>>>>>
>>>>>>>>   gc_grace_seconds=3D1800 AND
>>>>>>>>
>>>>>>>>   index_interval=3D128 AND
>>>>>>>>
>>>>>>>>   read_repair_chance=3D0.000000 AND
>>>>>>>>
>>>>>>>>   replicate_on_write=3D'true' AND
>>>>>>>>
>>>>>>>>   populate_io_cache_on_flush=3D'false' AND
>>>>>>>>
>>>>>>>>   default_time_to_live=3D0 AND
>>>>>>>>
>>>>>>>>   speculative_retry=3D'99.0PERCENTILE' AND
>>>>>>>>
>>>>>>>>   memtable_flush_period_in_ms=3D0 AND
>>>>>>>>
>>>>>>>>   compaction=3D{'min_sstable_size': '1024', 'tombstone_threshold':
>>>>>>>> '0.01', 'tombstone_compaction_interval': '1800', 'class':
>>>>>>>> 'SizeTieredCompactionStrategy'} AND
>>>>>>>>
>>>>>>>>   compression=3D{'sstable_compression': 'LZ4Compressor'};
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 27, 2016 at 5:34 PM, Vinay Kumar Chella <
>>>>>>>> vinaykumarcse@gmail.com> wrote:
>>>>>>>>
>>>>>>>> What is your GC_grace_seconds set to?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 27, 2016 at 1:13 PM, sai krishnam raju potturi <
>>>>>>>> pskraju88@gmail.com> wrote:
>>>>>>>>
>>>>>>>> thanks Vinay and DuyHai.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     we are using verison 2.0.14. I did "user defined compaction"
>>>>>>>> following the instructions in the below link, The tombstones still=
 persist
>>>>>>>> even after that.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://gist.github.com/jeromatron/e238e5795b3e79866b83
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__gist.githu=
b.com_jeromatron_e238e5795b3e79866b83&d=3DCwMFaQ&c=3D08AGY6txKsvMOP6lYkHQpP=
MRA1U6kqhAwGa8-0QCg3M&r=3DyfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=3D-=
sQ3Vf5bs3z4cO36h_AU-kIhMGVKcb3eCtzIb-fZ1Fc&s=3D0RQ3r6c0L4vICot8eqpOBKBAuKiK=
EkoKdmcjLbvBBwY&e=3D>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Also, we changed the tombstone_compaction_interval : 1800
>>>>>>>> and tombstone_threshold : 0.1, but it did not help.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 27, 2016 at 4:05 PM, DuyHai Doan <doanduyhai@gmail.com=
>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> This feature is also exposed directly in nodetool from version
>>>>>>>> Cassandra 3.4
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> nodetool compact --user-defined <SSTable file>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 27, 2016 at 9:58 PM, Vinay Chella <vchella@netflix.com=
>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> You can run file level compaction using JMX to get rid of
>>>>>>>> tombstones in one SSTable. Ensure you set GC_Grace_seconds such th=
at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> current time >=3D deletion(tombstone time)+ GC_Grace_seconds
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> File level compaction
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/bin/java -jar cmdline-jmxclient-0.10.3.jar - localhost:
>>>>>>>>
>>>>>>>> =E2=80=8B{=E2=80=8B
>>>>>>>>
>>>>>>>> =E2=80=8Bport}
>>>>>>>>
>>>>>>>>  org.apache.cassandra.db:type=3DCompactionManager forceUserDefined=
Compaction=3D"'${KEYSPACE}','${
>>>>>>>>
>>>>>>>> =E2=80=8BSSTABLEFILENAME
>>>>>>>>
>>>>>>>> }'""
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 27, 2016 at 11:59 AM, sai krishnam raju potturi <
>>>>>>>> pskraju88@gmail.com> wrote:
>>>>>>>>
>>>>>>>> hi;
>>>>>>>>
>>>>>>>>   we have a columnfamily that has around 1000 rows, with one row i=
s
>>>>>>>> really huge (million columns). 95% of the row contains tombstones.=
 Since
>>>>>>>> there exists just one SSTable , there is going to be no compaction=
 kicked
>>>>>>>> in. Any way we can get rid of the tombstones in that row?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Userdefined compaction nor nodetool compact had no effect. Any
>>>>>>>> ideas folks?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

--001a113d35d61c4b900538cf1b76
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I haven&#39;t tested that specifically, but I haven&#39;t =
bumped into any particular optimization that allows it to skip reading an s=
stable where the entire relevant partition has been row-tombstoned.=C2=A0 I=
t&#39;s possible that something like that could happen by examining min/max=
 timestamps on sstables, and not reading from any sstable with a partition-=
level tombstone where the max timestamp is less than the timestamp of the p=
artition tombstone.=C2=A0 However that presumes that it can have read the t=
ombstones from each sstable before it read the occluded data, which I don&#=
39;t think is likely.<div><br></div><div>Such an optimization could be ther=
e, but I haven&#39;t noticed it if it is, though I&#39;m certainly not an e=
xpert (more of a well informed novice).=C2=A0 If someone wants to set me st=
raight on this point I&#39;d love to know about it.<div><div><br><div class=
=3D"gmail_quote"><div dir=3D"ltr">On Fri, Jul 29, 2016 at 2:37 PM DuyHai Do=
an &lt;<a href=3D"mailto:doanduyhai@gmail.com">doanduyhai@gmail.com</a>&gt;=
 wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">@Eric<div>=
<br></div><div>Very interesting example. But then what is the case of row (=
should I say partition ?) tombstones ?</div><div><br></div><div>Suppose tha=
t in your example, I issued a=C2=A0<span style=3D"font-family:monospace;fon=
t-size:10.4px">DELETE FROM foo WHERE pk=3D&#39;a&#39;</span></div><div><br>=
</div><div>With the same SELECT statement than before, would C* be clever e=
nough to skip reading at all the whole partition (let&#39;s limit the examp=
le to a single SSTable) ?</div></div><div class=3D"gmail_extra"><br><div cl=
ass=3D"gmail_quote">On Fri, Jul 29, 2016 at 7:00 PM, Eric Stevens <span dir=
=3D"ltr">&lt;<a href=3D"mailto:mightye@gmail.com" target=3D"_blank">mightye=
@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><span>&gt;=C2=A0<span style=3D"color:rgb(33,33,33);font-family:&qu=
ot;helvetica neue&quot;,helvetica,arial,sans-serif;line-height:1.5">Sai was=
 describing a timeout, not a failure due to the 100 K tombstone limit from =
cassandra.yaml. But I still might be missing things about tombstones.</span=
><br><br></span><div>The trouble with tombstones is not the tombstones them=
selves, rather it&#39;s that Cassandra has a lot of deleted data to read th=
rough in sstables in order to satisfy a query.=C2=A0 Although if you range =
constrain your cluster key in your query, the read path can optimize that r=
ead to start somewhere near the correct head of your selected data, that is=
 _not_ true for tombstoned data. =C2=A0</div><div><br></div><div>Consider t=
his exercise:</div><div><font face=3D"monospace">CREATE TABLE foo (</font><=
/div><div><font face=3D"monospace">=C2=A0 pk text,</font></div><div><font f=
ace=3D"monospace">=C2=A0 ck int,</font></div><div><font face=3D"monospace">=
=C2=A0 PRIMARY KEY ((pk), ck)</font></div><div><font face=3D"monospace">)</=
font></div><div><font face=3D"monospace">INSERT INTO foo (pk,ck) VALUES (&#=
39;a&#39;, 1)</font></div><div><font face=3D"monospace">...</font></div><di=
v><font face=3D"monospace">INSERT INTO foo (pk,ck) VALUES (&#39;a&#39;, 100=
000)</font></div><div><br></div><div>$ nodetool flush</div><div><br></div><=
div><font face=3D"monospace">DELETE FROM foo WHERE pk=3D&#39;a&#39; AND ck =
&lt; 99999</font></div><div><br></div><div>We&#39;ve now written a single &=
quot;tiny&quot; (bytes-wise) range tombstone.</div><div><br></div><div>Now =
try to select from that table:</div><div><font face=3D"monospace">SELECT * =
FROM foo WHERE ck &gt; 50000 LIMIT 1</font></div><div><font face=3D"monospa=
ce">pk | ck=C2=A0</font></div><div><font face=3D"monospace">-- | ------</fo=
nt></div><div><font face=3D"monospace">a =C2=A0| 100000</font></div><div><b=
r></div><div>This has to read from the first sstable, skipping over 49999 r=
ecords before it can locate the first non-tombstoned cell.</div><div><br></=
div><div>The problem isn&#39;t the size of the tombstone, tombstones themse=
lves are cheaper (again, bytes-wise) than standard columns because they don=
&#39;t involve any value for the cell.=C2=A0 The problem is that the read p=
ath cannot anticipate in advance what cells are going to be occluded by the=
 tombstone, and in order to satisfy the query it needs to read then discard=
 a large number of deleted cells.</div><div><br></div><div>The reason the t=
hresholds exist in cassandra.yaml is to help guide users away from performa=
nce anti-patterns that come from selects which include a large number of to=
mbstoned cells.</div></div><div><div><br><div class=3D"gmail_quote"><div di=
r=3D"ltr">On Thu, Jul 28, 2016 at 11:08 PM Alain RODRIGUEZ &lt;<a href=3D"m=
ailto:arodrime@gmail.com" target=3D"_blank">arodrime@gmail.com</a>&gt; wrot=
e:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></d=
iv><div>@Eric<br></div></div><div dir=3D"ltr"><div><div><br></div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-widt=
h:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-le=
ft:1ex"><span style=3D"font-size:12.8px">Large range tombstones can occupy =
just a few bytes but can occlude millions of records, and have the correspo=
nding performance impact on reads.=C2=A0 It&#39;s really not the size of th=
e tombstone on disk that matters, but the number of records it occludes.</s=
pan></blockquote><div><br></div></div></div><div dir=3D"ltr"><div><div>Sai =
was describing a timeout, not a failure due to the 100 K tombstone limit fr=
om cassandra.yaml. But I still might be missing things about tombstones.</d=
iv></div></div><div dir=3D"ltr"><div><div><br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-l=
eft-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span =
style=3D"font-size:12.8px">The read queries are continuously failing though=
 because of the tombstones. &quot;Request did not complete within rpc_timeo=
ut.&quot;</span><br></blockquote><div><br></div></div></div><div dir=3D"ltr=
"><div><div>So that is what looks weird to me. Reading 220 kb, even holding=
 tombstone should probably not take that long... Or am I wrong or missing s=
omething?</div></div><div><br></div><div>Your talk looks like cool stuff :-=
).</div><div><br></div><div>@Sai</div></div><div dir=3D"ltr"><div><br></div=
><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border=
-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);=
padding-left:1ex"><span style=3D"font-size:12.8px">The issues here was that=
 tombstones were not in the SSTable, but rather in the Memtable</span></blo=
ckquote><div><br></div></div><div dir=3D"ltr"><div>This sounds weird to me =
as well, knowing that memory is faster than disk and that memtables are mut=
able data (so less stuff to read from there). Flushing might have triggered=
 some compaction, removing tombstones though.</div><div><br></div><div>This=
 still sounds very weird to me but I am glad you solved your issue (tempora=
ry at least).</div></div><div dir=3D"ltr"><div><br></div><div>C*heers,</div=
><div>-----------------------</div><div>Alain Rodriguez - <a href=3D"mailto=
:alain@thelastpickle.com" target=3D"_blank">alain@thelastpickle.com</a></di=
v><div>France</div><div><br></div><div>The Last Pickle - Apache Cassandra C=
onsulting</div><div><a href=3D"http://www.thelastpickle.com" target=3D"_bla=
nk">http://www.thelastpickle.com</a>=C2=A0<br></div></div><div class=3D"gma=
il_extra"><br><div class=3D"gmail_quote">2016-07-29 3:25 GMT+02:00 Eric Ste=
vens <span dir=3D"ltr">&lt;<a href=3D"mailto:mightye@gmail.com" target=3D"_=
blank">mightye@gmail.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">=
<div dir=3D"ltr">Tombstones will not get removed even after gc_grace if blo=
om filters indicate that there is overlapping data with the tombstone&#39;s=
 partition in a different sstable.=C2=A0 This is because compaction can&#39=
;t be certain that the tombstone doesn&#39;t overlap data in that other tab=
le.=C2=A0 If you&#39;re writing to one end of a partition key while deletin=
g off the other end (for example you&#39;ve created engaged in the queue an=
ti-pattern), your tombstones will essentially never go away.<span><div><spa=
n style=3D"line-height:1.5">=C2=A0</span><br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-le=
ft-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex;color:rg=
b(117,117,117);font-family:&quot;helvetica neue&quot;,helvetica,arial,sans-=
serif"><span style=3D"font-family:calibri;font-size:14.6667px">220kb worth =
of tombstones doesn=E2=80=99t seem like enough to worry about.</span></bloc=
kquote><br></span><div>Large range tombstones can occupy just a few bytes b=
ut can occlude millions of records, and have the corresponding performance =
impact on reads.=C2=A0 It&#39;s really not the size of the tombstone on dis=
k that matters, but the number of records it occludes.</div><div><br></div>=
<div>You must either do a full compaction (while also not writing to the pa=
rtitions being considered, and after you&#39;ve forced a cluster-wide flush=
, and after the tombstones are gc_grace old, and assuming size tiered and n=
ot leveled compaction) to get rid of those tombstones, or probably easier i=
s to do something similar to sstable2json, remove the tombstones by hand, t=
hen json2sstable and replace the offending sstable.=C2=A0 Note that you rea=
lly have to be certain what you&#39;re doing here or you&#39;ll end up resu=
rrecting deleted records.</div><div><br></div><div>If these all sound like =
bad options, it&#39;s because they are, and you don&#39;t have a lot of opt=
ions without changing your schema to eventually stop writing to (and especi=
ally reading from) partitions which you also do deletes on. =C2=A0<a href=
=3D"https://issues.apache.org/jira/browse/CASSANDRA-7019" target=3D"_blank"=
>https://issues.apache.org/jira/browse/CASSANDRA-7019</a>=C2=A0proposes to =
offer a better alternative, but it&#39;s still in progress. =C2=A0</div><di=
v><br></div><div>Shameless plug, I&#39;m talking about my company&#39;s alt=
ernative to tombstones and TTLs at this year&#39;s Cassandra Summit:=C2=A0<=
a href=3D"http://myeventagenda.com/sessions/1CBFC920-807D-41C1-942C-8D1A7C1=
0F4FA/5/5#sessionID=3D165" target=3D"_blank">http://myeventagenda.com/sessi=
ons/1CBFC920-807D-41C1-942C-8D1A7C10F4FA/5/5#sessionID=3D165</a></div><div>=
<br></div></div><div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr">O=
n Thu, Jul 28, 2016 at 11:07 AM sai krishnam raju potturi &lt;<a href=3D"ma=
ilto:pskraju88@gmail.com" target=3D"_blank">pskraju88@gmail.com</a>&gt; wro=
te:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"=
gmail_extra">thanks a lot Alain. That was really great info.=C2=A0</div><di=
v class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">The issues her=
e was that tombstones were not in the SSTable, but rather in the Memtable. =
We had to a nodetool flush, and run a nodetool compact to get rid of the to=
mbstones, a million of them. The size of the largest SSTable was actually 4=
8MB.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">T=
his link was helpful in getting the count of tombstones in a sstable, which=
 was 0 in our case.=C2=A0</div><div class=3D"gmail_extra"><a href=3D"https:=
//gist.github.com/JensRantil/063b7c56ca4a8dfe1c50" target=3D"_blank">https:=
//gist.github.com/JensRantil/063b7c56ca4a8dfe1c50</a><br></div><div class=
=3D"gmail_extra"><br></div><div class=3D"gmail_extra">=C2=A0 =C2=A0 The app=
lication team did not have a good model. They are working on a new datamode=
l.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">tha=
nks</div></div><div dir=3D"ltr"><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Wed, Jul 27, 2016 at 7:17 PM, Alain RODRIGUEZ <span dir=
=3D"ltr">&lt;<a href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arodri=
me@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:soli=
d;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hi,=
=C2=A0<div><br></div><div>I just released a detailed post about tombstones =
today that might be of some interest for you: <a href=3D"http://thelastpick=
le.com/blog/2016/07/27/about-deletes-and-tombstones.html" target=3D"_blank"=
>http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html=
</a><br></div><div><br></div><div><span><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:so=
lid;border-left-color:rgb(204,204,204);padding-left:1ex"><span style=3D"fon=
t-family:calibri;font-size:14.6667px">220kb worth of tombstones doesn=E2=80=
=99t seem like enough to worry about.</span></blockquote><div><br></div></s=
pan><div>+1</div><div><br></div><div>I believe you might be missing some ot=
her bigger SSTable having a lot of tombstones as well. Finding the biggest =
sstable and reading the tombstone ratio from there might be more relevant.<=
/div></div><div><br></div><div>You also should give a try to: &quot;uncheck=
ed_tombstone_compaction&quot; set to true rather than tuning other options =
so aggressively. The &quot;single SSTable compaction&quot; section of my po=
st=C2=A0might help you on this issue: <a href=3D"http://thelastpickle.com/b=
log/2016/07/27/about-deletes-and-tombstones.html#single-sstable-compaction"=
 target=3D"_blank">http://thelastpickle.com/blog/2016/07/27/about-deletes-a=
nd-tombstones.html#single-sstable-compaction</a></div><div><br></div><div>O=
ther thoughts:</div><div><div><br></div><div>Also if you use TTLs and times=
eries, using TWCS instead of STCS could be more efficient evicting tombston=
es.</div></div><span><div><br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;=
border-left-color:rgb(204,204,204);padding-left:1ex"><span style=3D"font-si=
ze:12.8px">we have a columnfamily that has around 1000 rows, with one row i=
s really huge (million columns)</span></blockquote><div><br></div></span><d=
iv>I am sorry to say that this model does not look that great. Imbalances m=
ight become an issue as a few nodes will handle a lot more load than the re=
st of the nodes. Also even if this is getting improved in newer versions of=
 Cassandra, wide rows are something you want to avoid while using 2.0.14 (w=
hich is no longer supported for about a year now). I know it is not always =
easy and never the good time, but maybe should you consider upgrading both =
your model and your version of Cassandra (regardless of the fact you manage=
 to solve this issue or not with &quot;unchecked_tombstone_compaction&quot;=
).</div><div><br></div><div>Good luck,</div><div><br></div><div>C*heers,</d=
iv><div><div>-----------------------</div><div>Alain Rodriguez - <a href=3D=
"mailto:alain@thelastpickle.com" target=3D"_blank">alain@thelastpickle.com<=
/a></div><div>France</div><div><br></div><div>The Last Pickle - Apache Cass=
andra Consulting</div><div><a href=3D"http://www.thelastpickle.com" target=
=3D"_blank">http://www.thelastpickle.com</a></div></div></div><div><div><di=
v class=3D"gmail_extra"><br><div class=3D"gmail_quote">2016-07-28 0:00 GMT+=
02:00 sai krishnam raju potturi <span dir=3D"ltr">&lt;<a href=3D"mailto:psk=
raju88@gmail.com" target=3D"_blank">pskraju88@gmail.com</a>&gt;</span>:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);p=
adding-left:1ex"><div dir=3D"ltr">The read queries are continuously failing=
 though because of the tombstones. &quot;Request did not complete within rp=
c_timeout.&quot;<div><br></div><div>thanks</div><div><br></div></div><div><=
div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Jul 2=
7, 2016 at 5:51 PM, Jeff Jirsa <span dir=3D"ltr">&lt;<a href=3D"mailto:jeff=
.jirsa@crowdstrike.com" target=3D"_blank">jeff.jirsa@crowdstrike.com</a>&gt=
;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p=
x 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color=
:rgb(204,204,204);padding-left:1ex"><div bgcolor=3D"white" lang=3D"EN-US"><=
div><p><span style=3D"font-size:11pt;font-family:calibri">220kb worth of to=
mbstones doesn=E2=80=99t seem like enough to worry about.<u></u><u></u></sp=
an></p><p><span style=3D"font-size:11pt;font-family:calibri"><u></u>=C2=A0<=
u></u></span></p><p><span style=3D"font-size:11pt;font-family:calibri"><u><=
/u>=C2=A0<u></u></span></p><div style=3D"border-right-width:initial;border-=
bottom-width:initial;border-left-width:initial;border-style:solid none none=
;border-right-color:initial;border-bottom-color:initial;border-left-color:i=
nitial;border-top-width:1pt;border-top-color:rgb(181,196,223);padding:3pt 0=
in 0in"><p><b><span style=3D"font-family:calibri;color:black">From: </span>=
</b><span style=3D"font-family:calibri;color:black">sai krishnam raju pottu=
ri &lt;<a href=3D"mailto:pskraju88@gmail.com" target=3D"_blank">pskraju88@g=
mail.com</a>&gt;<br><b>Reply-To: </b>&quot;<a href=3D"mailto:user@cassandra=
.apache.org" target=3D"_blank">user@cassandra.apache.org</a>&quot; &lt;<a h=
ref=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cassandra.a=
pache.org</a>&gt;<br><b>Date: </b>Wednesday, July 27, 2016 at 2:43 PM<br><b=
>To: </b>Cassandra Users &lt;<a href=3D"mailto:user@cassandra.apache.org" t=
arget=3D"_blank">user@cassandra.apache.org</a>&gt;<br><b>Subject: </b>Re: R=
e : Purging tombstones from a particular row in SSTable<u></u><u></u></span=
></p></div><div><div><div><p><u></u>=C2=A0<u></u></p></div><div><div><div><=
p>and also the sstable size in question is like<span style=3D"background-im=
age:initial;background-color:yellow;background-position:initial;background-=
repeat:initial"> 220 kb</span> in size. <u></u><u></u></p><div><p><u></u>=
=C2=A0<u></u></p></div><div><p>thanks<u></u><u></u></p></div><div><p><u></u=
>=C2=A0<u></u></p></div></div><div><p><u></u>=C2=A0<u></u></p><div><p>On We=
d, Jul 27, 2016 at 5:41 PM, sai krishnam raju potturi &lt;<a href=3D"mailto=
:pskraju88@gmail.com" target=3D"_blank">pskraju88@gmail.com</a>&gt; wrote:<=
u></u><u></u></p><blockquote style=3D"border-top-width:initial;border-right=
-width:initial;border-bottom-width:initial;border-style:none none none soli=
d;border-top-color:initial;border-right-color:initial;border-bottom-color:i=
nitial;border-left-width:1pt;border-left-color:rgb(204,204,204);padding:0in=
 0in 0in 6pt;margin-left:4.8pt;margin-right:0in"><div><p>it&#39;s set to 18=
00 Vinay. <u></u><u></u></p><div><p><u></u>=C2=A0<u></u></p></div><div><div=
><p>=C2=A0bloom_filter_fp_chance=3D0.010000 AND<u></u><u></u></p></div><div=
><p>=C2=A0 caching=3D&#39;KEYS_ONLY&#39; AND<u></u><u></u></p></div><div><p=
>=C2=A0 comment=3D&#39;&#39; AND<u></u><u></u></p></div><div><p>=C2=A0 dclo=
cal_read_repair_chance=3D0.100000 AND<u></u><u></u></p></div><div><p>=C2=A0=
 <span style=3D"background-image:initial;background-color:yellow;background=
-position:initial;background-repeat:initial">gc_grace_seconds=3D1800</span>=
 AND<u></u><u></u></p></div><div><p>=C2=A0 index_interval=3D128 AND<u></u><=
u></u></p></div><div><p>=C2=A0 read_repair_chance=3D0.000000 AND<u></u><u><=
/u></p></div><div><p>=C2=A0 replicate_on_write=3D&#39;true&#39; AND<u></u><=
u></u></p></div><div><p>=C2=A0 populate_io_cache_on_flush=3D&#39;false&#39;=
 AND<u></u><u></u></p></div><div><p>=C2=A0 default_time_to_live=3D0 AND<u><=
/u><u></u></p></div><div><p>=C2=A0 speculative_retry=3D&#39;99.0PERCENTILE&=
#39; AND<u></u><u></u></p></div><div><p>=C2=A0 memtable_flush_period_in_ms=
=3D0 AND<u></u><u></u></p></div><div><p>=C2=A0 compaction=3D{&#39;min_sstab=
le_size&#39;: &#39;1024&#39;, &#39;tombstone_threshold&#39;: &#39;0.01&#39;=
, &#39;tombstone_compaction_interval&#39;: &#39;1800&#39;, &#39;class&#39;:=
 &#39;SizeTieredCompactionStrategy&#39;} AND<u></u><u></u></p></div><div><p=
>=C2=A0 compression=3D{&#39;sstable_compression&#39;: &#39;LZ4Compressor=
9;};<u></u><u></u></p></div></div><div><p><u></u>=C2=A0<u></u></p></div><di=
v><p>thanks<u></u><u></u></p></div><div><p><u></u>=C2=A0<u></u></p></div></=
div><div><div><div><p><u></u>=C2=A0<u></u></p><div><p>On Wed, Jul 27, 2016 =
at 5:34 PM, Vinay Kumar Chella &lt;<a href=3D"mailto:vinaykumarcse@gmail.co=
m" target=3D"_blank">vinaykumarcse@gmail.com</a>&gt; wrote:<u></u><u></u></=
p><blockquote style=3D"border-top-width:initial;border-right-width:initial;=
border-bottom-width:initial;border-style:none none none solid;border-top-co=
lor:initial;border-right-color:initial;border-bottom-color:initial;border-l=
eft-width:1pt;border-left-color:rgb(204,204,204);padding:0in 0in 0in 6pt;ma=
rgin-left:4.8pt;margin-right:0in"><div><div><p><span style=3D"font-family:v=
erdana">What is your GC_grace_seconds set to?<u></u><u></u></span></p></div=
><div><div><div><p><u></u>=C2=A0<u></u></p><div><p>On Wed, Jul 27, 2016 at =
1:13 PM, sai krishnam raju potturi &lt;<a href=3D"mailto:pskraju88@gmail.co=
m" target=3D"_blank">pskraju88@gmail.com</a>&gt; wrote:<u></u><u></u></p><b=
lockquote style=3D"border-top-width:initial;border-right-width:initial;bord=
er-bottom-width:initial;border-style:none none none solid;border-top-color:=
initial;border-right-color:initial;border-bottom-color:initial;border-left-=
width:1pt;border-left-color:rgb(204,204,204);padding:0in 0in 0in 6pt;margin=
-left:4.8pt;margin-right:0in"><div><p>thanks Vinay and DuyHai. <u></u><u></=
u></p><div><p><u></u>=C2=A0<u></u></p></div><div><p>=C2=A0 =C2=A0 we are us=
ing verison 2.0.14. I did &quot;user defined compaction&quot; following the=
 instructions in the below link, The tombstones still persist even after th=
at.<u></u><u></u></p></div><div><p><u></u>=C2=A0<u></u></p></div><div><p><a=
 href=3D"https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__gist.github=
.com_jeromatron_e238e5795b3e79866b83&amp;d=3DCwMFaQ&amp;c=3D08AGY6txKsvMOP6=
lYkHQpPMRA1U6kqhAwGa8-0QCg3M&amp;r=3DyfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3Ct=
aT3ow&amp;m=3D-sQ3Vf5bs3z4cO36h_AU-kIhMGVKcb3eCtzIb-fZ1Fc&amp;s=3D0RQ3r6c0L=
4vICot8eqpOBKBAuKiKEkoKdmcjLbvBBwY&amp;e=3D" target=3D"_blank">https://gist=
.github.com/jeromatron/e238e5795b3e79866b83</a><u></u><u></u></p></div><div=
><p><u></u>=C2=A0<u></u></p></div><div><p>Also, we changed the=C2=A0<span s=
tyle=3D"font-family:&quot;helvetica neue&quot;;color:rgb(55,76,81)">tombsto=
ne_compaction_interval : 1800 and=C2=A0tombstone_threshold : 0.1, but it di=
d not help.</span><u></u><u></u></p></div><div><p><u></u>=C2=A0<u></u></p><=
/div><div><p><span style=3D"font-family:&quot;helvetica neue&quot;;color:rg=
b(55,76,81)">thanks</span><u></u><u></u></p></div><div><div><div><p><u></u>=
=C2=A0<u></u></p></div><div><p><u></u>=C2=A0<u></u></p></div><div><p><u></u=
>=C2=A0<u></u></p><div><p>On Wed, Jul 27, 2016 at 4:05 PM, DuyHai Doan &lt;=
<a href=3D"mailto:doanduyhai@gmail.com" target=3D"_blank">doanduyhai@gmail.=
com</a>&gt; wrote:<u></u><u></u></p><blockquote style=3D"border-top-width:i=
nitial;border-right-width:initial;border-bottom-width:initial;border-style:=
none none none solid;border-top-color:initial;border-right-color:initial;bo=
rder-bottom-color:initial;border-left-width:1pt;border-left-color:rgb(204,2=
04,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in"><div><p=
>This feature is also exposed directly in nodetool from version Cassandra 3=
.4 <u></u><u></u></p><div><p><u></u>=C2=A0<u></u></p></div><div><p>nodetool=
 compact --user-defined=C2=A0&lt;SSTable file&gt;<u></u><u></u></p></div></=
div><div><div><div><p><u></u>=C2=A0<u></u></p><div><p>On Wed, Jul 27, 2016 =
at 9:58 PM, Vinay Chella &lt;<a href=3D"mailto:vchella@netflix.com" target=
=3D"_blank">vchella@netflix.com</a>&gt; wrote:<u></u><u></u></p><blockquote=
 style=3D"border-top-width:initial;border-right-width:initial;border-bottom=
-width:initial;border-style:none none none solid;border-top-color:initial;b=
order-right-color:initial;border-bottom-color:initial;border-left-width:1pt=
;border-left-color:rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8=
pt;margin-right:0in"><div><div><p><span style=3D"font-family:arial">You can=
 run file level compaction using JMX to get rid of tombstones in one SSTabl=
e. Ensure you set GC_Grace_seconds such that=C2=A0<u></u><u></u></span></p>=
</div><div><p><span style=3D"font-family:arial"><u></u>=C2=A0<u></u></span>=
</p></div><blockquote style=3D"border-top-width:initial;border-right-width:=
initial;border-bottom-width:initial;border-style:none none none solid;borde=
r-top-color:initial;border-right-color:initial;border-bottom-color:initial;=
border-left-width:1pt;border-left-color:rgb(204,204,204);padding:0in 0in 0i=
n 6pt;margin-left:4.8pt;margin-right:0in"><p>current time &gt;=3D deletion(=
tombstone time)+ GC_Grace_seconds=C2=A0<u></u><u></u></p></blockquote><div>=
<p><span style=3D"font-family:arial"><u></u>=C2=A0<u></u></span></p></div><=
div><p><span style=3D"font-family:arial">File level compaction<u></u><u></u=
></span></p></div><div><p><span style=3D"font-family:arial"><u></u>=C2=A0<u=
></u></span></p></div><blockquote style=3D"border-top-width:initial;border-=
right-width:initial;border-bottom-width:initial;border-style:none none none=
 solid;border-top-color:initial;border-right-color:initial;border-bottom-co=
lor:initial;border-left-width:1pt;border-left-color:rgb(204,204,204);paddin=
g:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in"><pre style=3D"backgro=
und-image:initial;background-color:rgb(43,43,43);background-position:initia=
l;background-repeat:initial"><span style=3D"font-size:9pt;font-family:menlo=
;color:rgb(106,135,89)">/usr/bin/java -jar cmdline-jmxclient-0.10.3.jar - l=
ocalhost:<u></u><u></u></span></pre><div><pre style=3D"background-image:ini=
tial;background-color:rgb(43,43,43);background-position:initial;background-=
repeat:initial"><span style=3D"font-size:9pt;font-family:arial;color:rgb(10=
6,135,89)">=E2=80=8B{=E2=80=8B<u></u><u></u></span></pre></div><div><pre st=
yle=3D"background-image:initial;background-color:rgb(43,43,43);background-p=
osition:initial;background-repeat:initial"><span style=3D"font-size:9pt;fon=
t-family:arial;color:rgb(106,135,89)">=E2=80=8Bport}<u></u><u></u></span></=
pre></div><pre style=3D"background-image:initial;background-color:rgb(43,43=
,43);background-position:initial;background-repeat:initial"><span style=3D"=
font-size:9pt;font-family:menlo;color:rgb(106,135,89)"> org.apache.cassandr=
a.db:type=3DCompactionManager forceUserDefinedCompaction=3D&quot;&#39;${KEY=
SPACE}&#39;</span><span style=3D"font-size:9pt;font-family:menlo;color:rgb(=
169,183,198)">,</span><span style=3D"font-size:9pt;font-family:menlo;color:=
rgb(106,135,89)">&#39;${<u></u><u></u></span></pre><div><pre style=3D"backg=
round-image:initial;background-color:rgb(43,43,43);background-position:init=
ial;background-repeat:initial"><span style=3D"font-size:9pt;font-family:ari=
al;color:rgb(106,135,89)">=E2=80=8BSSTABLEFILENAME<u></u><u></u></span></pr=
e></div><pre style=3D"background-image:initial;background-color:rgb(43,43,4=
3);background-position:initial;background-repeat:initial"><span style=3D"fo=
nt-size:9pt;font-family:menlo;color:rgb(106,135,89)">}&#39;&quot;&quot;</sp=
an><span style=3D"font-size:9pt;font-family:menlo;color:rgb(169,183,198)"><=
u></u><u></u></span></pre></blockquote><div><div><div><p><span style=3D"fon=
t-family:arial"><u></u>=C2=A0<u></u></span></p></div><div><p><br clear=3D"a=
ll"><u></u><u></u></p><div><div><div><div><div><div><p><span style=3D"font-=
size:13.5pt;font-family:calibri;color:black"><u></u>=C2=A0<u></u></span></p=
></div></div></div></div></div></div><p><u></u>=C2=A0<u></u></p><div><p>On =
Wed, Jul 27, 2016 at 11:59 AM, sai krishnam raju potturi &lt;<a href=3D"mai=
lto:pskraju88@gmail.com" target=3D"_blank">pskraju88@gmail.com</a>&gt; wrot=
e:<u></u><u></u></p><blockquote style=3D"border-top-width:initial;border-ri=
ght-width:initial;border-bottom-width:initial;border-style:none none none s=
olid;border-top-color:initial;border-right-color:initial;border-bottom-colo=
r:initial;border-left-width:1pt;border-left-color:rgb(204,204,204);padding:=
0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in"><div><p>hi; <u></u><u><=
/u></p><div><p>=C2=A0 we have a columnfamily that has around 1000 rows, wit=
h one row is really huge (million columns). 95% of the row contains tombsto=
nes. Since there exists just one SSTable , there is going to be no compacti=
on kicked in. Any way we can get rid of the tombstones in that row?<u></u><=
u></u></p></div><div><p><u></u>=C2=A0<u></u></p></div><div><p>Userdefined c=
ompaction nor nodetool compact had no effect. Any ideas folks?<u></u><u></u=
></p></div><div><p><u></u>=C2=A0<u></u></p></div><div><p>thanks<u></u><u></=
u></p></div><div><p><u></u>=C2=A0<u></u></p></div><div><p>=C2=A0<u></u><u><=
/u></p></div></div></blockquote></div><p><u></u>=C2=A0<u></u></p></div></di=
v></div></div></blockquote></div><p><u></u>=C2=A0<u></u></p></div></div></d=
iv></blockquote></div><p><u></u>=C2=A0<u></u></p></div></div></div></div></=
blockquote></div><p><u></u>=C2=A0<u></u></p></div></div></div></div></block=
quote></div><p><u></u>=C2=A0<u></u></p></div></div></div></blockquote></div=
><p><u></u>=C2=A0<u></u></p></div></div></div></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></blockquote></div>
</div></div></blockquote></div><br></div>
</blockquote></div>
</div></div></blockquote></div><br></div>
</blockquote></div></div></div></div></div>

--001a113d35d61c4b900538cf1b76--