cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5722) Cleanup should skip sstables that don't contain data outside a nodes ranges
Date Tue, 16 Jul 2013 21:20:50 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710268#comment-13710268
] 

Jonathan Ellis commented on CASSANDRA-5722:
-------------------------------------------

Sorry, I was really fixating on trying to relate your comment to that commit.  I see what
you mean, now.

That's actually the way it used to work, pre-CASSANDRA-4710 -- tldr, in Daniel's workload,
the decoration was indeed higher than the extra work from scanning the extra rows.

If you think about it, you can see why that might be so -- we only have to scan extra rows
on a bloom filter false positive.  The common case by the time we start looping through rows
is that the row we're looking for exists.

Now that we allow disabling BF entirely that might not always be the case, but if you're disabling
your BF I have to assume you know what you're doing and are prepared for the consequences.
:)
                
> Cleanup should skip sstables that don't contain data outside a nodes ranges
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5722
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5722
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Nick Bailey
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.1
>
>         Attachments: 0001-Skip-cleanup-when-unneeded.patch
>
>
> Right now cleanup is optimized to simply delete sstables that *only* contain data that
doesn't belong on the node, for all other sstables though, it will read them, check each row,
and write out new sstables.
> Cleanup could be optimized to look at an sstable and determine that all data within the
sstable does belong on a node, and therefore skip re-writing that sstable. This would make
cleanup essentially a noop in the case where all data on a node belongs on that node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message