cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Morton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2589) row deletes do not remove columns
Date Mon, 02 May 2011 09:00:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027595#comment-13027595
] 

Aaron Morton commented on CASSANDRA-2589:
-----------------------------------------

to clarify, I'm suggesting that if ColumnFamily.markedForDeletionAt is set then ColumnFamilySerializer.serializeForSSTable()
only write columns with a higher time stamp. This would be to reduce disk usage and remove
the need to filter filter columns that will be ignored. 

This would have the minimum impact. 

This is in addition to the fix for CASSANDRA-2590

> row deletes do not remove columns
> ---------------------------------
>
>                 Key: CASSANDRA-2589
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2589
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.5, 0.8 beta 1
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>            Priority: Minor
>
> When a row delete is issued CF.delete() sets the localDeletetionTime and markedForDeleteAt
values but does not remove columns which have a lower time stamp. As a result:
> # Memory which could be freed is held on to (prob not too bad as it's already counted)
> # The deleted columns are serialised to disk, along with the CF info to say they are
no longer valid. 
> # NamesQueryFilter and SliceQueryFilter have to do more work as they filter out the irrelevant
columns using QueryFilter.isRelevant()
> # Also columns written with a lower time stamp after the deletion are added to the CF
without checking markedForDeletionAt.
> This can cause RR to fail, will create another ticket for that and link. This ticket
is for a fix to removing the columns. 
> Two options I could think of:
> # Check for deletion when serialising to SSTable and ignore columns if the have a lower
timestamp. Otherwise leave as is so dead columns stay in memory. 
> # Ensure at all times if the CF is deleted all columns it contains have a higher timestamp.

> ## I *think* this would include all column types (DeletedColumn as well) as the CF deletion
has the same effect. But not sure.
> ## Deleting (potentially) all columns in delete() will take time. Could track the highest
timestamp in the CF so the normal case of deleting all cols does not need to iterate. 
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message