cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5906) Avoid allocating over-large bloom filters
Date Wed, 18 Sep 2013 16:37:52 GMT


Jonathan Ellis commented on CASSANDRA-5906:

The approach looks good to me.  I vote we ship it and have QA test a few different kinds of
compactions to make sure the error rate stays low, but it looks like the defaults work pretty
damn well.  (And we should be okay on thread safety; only one thread builds it, after which
it's immutable.)

Two comments -- 

# What's going on with {{offer}} that we need to {{getArray}} instead of handing it the ByteBuffer?
 Would prefer to fix that in HLL rather than copy out the array.
# Is 50K per sstable acceptable for LCS?  that's 500MB if we have 10k sstables which is within
our goal of 5+ TB.  I'd be more comfortable if we pull this in only at compaction time.  Note
that we have precedent for doing this in the ancestors (CASSANDRA-5342) but if we keep adding
"compaction time" metadata the same way then things are going to get messy; some cleanup is
probably in order.
> Avoid allocating over-large bloom filters
> -----------------------------------------
>                 Key: CASSANDRA-5906
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>             Fix For: 2.0.1
> We conservatively estimate the number of partitions post-compaction to be the total number
of partitions pre-compaction.  That is, we assume the worst-case scenario of no partition
overlap at all.
> This can result in substantial memory wasted in sstables resulting from highly overlapping

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message