cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5906) Avoid allocating over-large bloom filters
Date Thu, 19 Sep 2013 12:32:52 GMT


Jonathan Ellis commented on CASSANDRA-5906:

bq. Since ByteBuffer's hashCode is only a function of the number of bits remaining we cannot
use it directly in the offer function.

I don't follow -- that should be exactly the desired behavior.  The ByteBuffer offset/remaining
are telling us, "this is the part of the backing array that we're interested in," which lets
us "split up" regions of memory without having to actually copy to new arrays.

bq. The size of the HLL is a function of how precise you need it to be. If we use a p of 15
instead of 16 the size drops to 21K. Inserting the same 500K elements into a HLL+ with p=15
yields of .58% in my tests.

So, we can trade a factor of 2 size for roughly a factor of 2 precision?.  Unless we have
a use for keeping these on heap that I can't think of, I'd say we should double the size and
only read them in for compaction.
> Avoid allocating over-large bloom filters
> -----------------------------------------
>                 Key: CASSANDRA-5906
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>             Fix For: 2.0.1
> We conservatively estimate the number of partitions post-compaction to be the total number
of partitions pre-compaction.  That is, we assume the worst-case scenario of no partition
overlap at all.
> This can result in substantial memory wasted in sstables resulting from highly overlapping

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message