cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns
Date Mon, 17 Oct 2011 13:13:10 GMT


Jonathan Ellis commented on CASSANDRA-2466:

Some comments:

            for (int i = 0; i < pageSize && bitLength-- > 0; i++)

Isn't one of those bounds checks redundant?  It looks like wlen/bitLength is the right one
to use.

* lots of commented-out methods in OBS. Let's remove them entirely if we're not going to implement.
* OBS.setPage is unnecessary
> bloom filters should avoid huge array allocations to avoid fragmentation concerns
> ---------------------------------------------------------------------------------
>                 Key: CASSANDRA-2466
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Priority: Minor
>         Attachments: 2466v1.patch,,
> The fact that bloom filters are backed by single large arrays of longs is expected to
interact badly with promotion of objects into old gen with CMS, due to fragmentation concerns
(as discussed in CASSANDRA-2463).
> It should be less of an issue than CASSANDRA-2463 in the sense that you need to have
a lot of rows before the array sizes become truly huge. For comparison, the ~ 143 million
row key limit implied by the use of 'int' in BitSet prior to the switch to OpenBitSet translates
roughly to 238 MB (assuming the limitation factor there was the addressability of the bits
with a 32 bit int, which is my understanding).
> Having a preliminary look at OpenBitSet with an eye towards replacing the single long[]
with multiple arrays, it seems that if we're willing to drop some of the functionality that
is not used for bloom filter purposes, the bits[i] indexing should be pretty easy to augment
with modulo to address an appropriate smaller array. Locality is not an issue since the bloom
filter case is the worst possible case for locality anyway, and it doesn't matter whether
it's one huge array or a number of ~ 64k arrays.
> Callers may be affected like BloomFilterSerializer which cares about the underlying bit
> If the full functionality of OpenBitSet is to be maintained (e.g., xorCount) some additional
acrobatics would be necessary and presumably at a noticable performance cost if such operations
were to be used in performance critical places.
> An argument against touching OpenBitSet is that it seems to be pretty carefully written
and tested and has some non-trivial details and people have seemingly benchmarked it quite
carefully. On the other hand, the improvement would then apply to other things as well, such
as the bitsets used to keep track of in-core pages (off the cuff for scale, a 64 gig sstable
should imply a 2 mb bit set, with one bit per 4k page).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message