incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Mullins <smull...@thebrighttag.com>
Subject Re: Strange row expiration behavior
Date Tue, 23 Oct 2012 14:05:48 GMT
Thanks Aaron, my reply is inline below:

On Tue, Oct 23, 2012 at 2:38 AM, aaron morton <aaron@thelastpickle.com>wrote:

> Performing these steps results in the rows still being present using *cassandra-cli
> list*.
>
> I assume you are saying the row key is listed without any columns. aka a
> ghost row.
>
Correct.

>
>  What gets really odd is if I add these steps it works
>
> That's working as designed.
>
> gc_grace_seconds does not specify when tombstones must be purged, rather
> it specifies the minimum duration the tombstone must be stored. It's really
> saying "if you compact this column X seconds after the delete you can purge
> the tombstone".
>
> Minor / automatic compaction will kick in if there are (by default) 4
> SSTables of the same size. And will only purge tombstones if all fragments
> of the row exists in the SSTables being compaction.
>
> Major / manual compaction compacts all the sstables, and so purges the
> tombstones IF gc_grace_seconds has expired.
>
> In your first example compaction had not run so the tombstones stayed on
> disk. In the second the major compaction purged expired tombstones.
>
In the first example, I am running compaction at step 7 through nodetool,
after gc_grace_seconds has expired. Additionally, if I do not perform the
manual delete of the row in the second example, the ghost rows are not
cleaned up. I want to know that in our production environment, I don't have
to manually delete empty rows after the columns expire. But I can't get an
example working to that effect.

>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 23/10/2012, at 2:49 PM, Stephen Mullins <smullins@thebrighttag.com>
> wrote:
>
> Hello, I'm seeing Cassandra behavior that I can't explain, on v1.0.12. I'm
> trying to test removing rows after all columns have expired. I've read the
> following:
> http://wiki.apache.org/cassandra/DistributedDeletes
> http://wiki.apache.org/cassandra/MemtableSSTable
> https://issues.apache.org/jira/browse/CASSANDRA-2795
>
> And came up with a test to demonstrate the empty row removal that does the
> following:
>
>    1. create a keyspace
>    2. create a column family with gc_seconds=10 (arbitrary small number)
>    3. insert a couple rows with ttl=5 (again, just a small number)
>    4. use nodetool to flush the column family
>    5. sleep >10 seconds
>    6. ensure the columns are removed with *cassandra-cli list *
>    7. use nodetool to compact the keyspace
>
> Performing these steps results in the rows still being present using *cassandra-cli
> list*. What gets really odd is if I add these steps it works:
>
>    1. sleep 5 seconds
>    2. use cassandra-cli to *del mycf[arow]*
>    3. use nodetool to flush the column family
>    4. use nodetool to compact the keyspace
>
> I don't understand why the first set of steps (1-7) don't work to remove
> the empty row, nor do I understand why the explicit row delete somehow
> makes this work. I have all this in a script that I could attach if that's
> appropriate. Is there something wrong with the steps that I have?
>
> Thanks,
> Stephen
>
>
>

Mime
View raw message