cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Analia Lorenzatto <analialorenza...@gmail.com>
Subject Re: Question about how to remove data
Date Sat, 22 Aug 2015 17:11:19 GMT
Thanks guys for the answers!

Saludos / Regards.

Analía Lorenzatto.

"Hapiness is not something really made. It comes from your own actions" by
Dalai Lama


On 21 Aug 2015 2:31 pm, "Sebastian Estevez" <sebastian.estevez@datastax.com>
wrote:

> To clarify, you do not need a ttl for deletes to be compacted away in
> Cassandra. When you delete, we create a tombstone which will remain in the
> system __at least__ gc grace seconds. We wait this long to give the
> tombstone a chance to make it to all replica nodes, the best practice is to
> run repairs as often as gc grace seconds in order to ensure edge cases
> where data comes back to life (i.e. the tombstone was never sent to one of
> your replicas and when the tombstones and data are removed from the other
> two replicas, all that is left is the old value.
>
> __at least__ are the key words in the previous paragraph, there are more
> conditions that need to be met in order for a tombstone to actually get
> cleaned up. As most things in Cassandra, these conditions are configurable
> (via the following compaction sub-properties):
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_configure_compaction_t.html
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Thu, Aug 20, 2015 at 4:13 PM, Daniel Chia <danchia@coursera.org> wrote:
>
>> The TTL shouldn't matter if you deleted the data, since to my
>> understanding the delete should shadow the data signaling to C* that the
>> data is a candidate for removal on compaction.
>>
>> Others might know better, but it could very well be the fact that
>> gc_grace_seconds is 0 that is causing your problems. Others might have
>> other suggestions, but you could potentially use sstable2json to see the
>> raw contents of the sstable on disk and see why data is still there.
>>
>> Thanks,
>> Daniel
>>
>> On Thu, Aug 20, 2015 at 12:55 PM, Analia Lorenzatto <
>> analialorenzatto@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Daniel, I am using Size Tiered compaction.
>>>
>>> My concern is that as I do not have a TTL defined on the Column family,
>>> and I do not have the possibility to create it.   Perhaps, the "deleted
>>> data" is never actually going to be removed?
>>>
>>> Thanks a lot!
>>>
>>>
>>> On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia <danchia@coursera.org>
>>> wrote:
>>>
>>>> Is this a LCS family, or Size Tiered? Manually running compaction on
>>>> LCS doesn't do anything until C* 2.2 (
>>>> https://issues.apache.org/jira/browse/CASSANDRA-7272)
>>>>
>>>> Thanks,
>>>> Daniel
>>>>
>>>> On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto <
>>>> analialorenzatto@gmail.com> wrote:
>>>>
>>>>> Hello Michael,
>>>>>
>>>>> Thanks for responding!
>>>>>
>>>>> I do not have snapshots on any node of the cluster.
>>>>>
>>>>> Saludos / Regards.
>>>>>
>>>>> Analía Lorenzatto.
>>>>>
>>>>> "Hapiness is not something really made. It comes from your own
>>>>> actions" by Dalai Lama
>>>>>
>>>>>
>>>>> On 19 Aug 2015 6:19 pm, "Laing, Michael" <michael.laing@nytimes.com>
>>>>> wrote:
>>>>>
>>>>>> Possibly you have snapshots? If so, use nodetool to clear them.
>>>>>>
>>>>>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
>>>>>> analialorenzatto@gmail.com> wrote:
>>>>>>
>>>>>>> Hello guys,
>>>>>>>
>>>>>>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>>>>>>
>>>>>>> I removed a lot of data in a Column Family, then I ran manually
a
>>>>>>> compaction on this Column family on every node.   After doing
that, If I
>>>>>>> query that data, cassandra correctly says this data is not there.
 But the
>>>>>>> space on disk is exactly the same before removing that data.
>>>>>>>
>>>>>>> Also, I realized that  gc_grace_seconds = 0.  Some people on
the
>>>>>>> internet say that it could produce zombie data, what do you think?
>>>>>>>
>>>>>>> I do not have a TTL defined on the Column family, and I do not
have
>>>>>>> the possibility to create it.   So my questions is, given that
I do not
>>>>>>> have a TTL defined is data going to be removed?  or the deleted
data is
>>>>>>> never actually going to be deleted due to I do not have a TTL?
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> --
>>>>>>> Saludos / Regards.
>>>>>>>
>>>>>>> Analía Lorenzatto.
>>>>>>>
>>>>>>> “It's possible to commit no errors and still lose. That is
not
>>>>>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>> Saludos / Regards.
>>>
>>> Analía Lorenzatto.
>>>
>>> “It's possible to commit no errors and still lose. That is not
>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>
>>
>>
>

Mime
View raw message