cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gustav Munkby (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-9060) Anticompaction hangs on bloom filter bitset serialization
Date Fri, 27 Mar 2015 19:35:52 GMT
Gustav Munkby created CASSANDRA-9060:
----------------------------------------

             Summary: Anticompaction hangs on bloom filter bitset serialization 
                 Key: CASSANDRA-9060
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9060
             Project: Cassandra
          Issue Type: Bug
            Reporter: Gustav Munkby
            Priority: Minor


I tried running an incremental repair against a 15-node vnode-cluster with roughly 500GB data
running on 2.1.3-SNAPSHOT, without performing the suggested migration steps. I manually chose
a small range for the repair (using --start/end-token). The actual repair part took almost
no time at all, but the anticompactions took a lot of time (not surprisingly).

Obviously, this might not be the ideal way to run incremental repairs, but I wanted to look
into what made the whole process so slow. The results were rather surprising. The majority
of the time was spent serializing bloom filters.

The reason seemed to be two-fold. First, the bloom-filters generated were huge (probably because
the original SSTables were large). With a proper migration to incremental repairs, I'm guessing
this would not happen. Secondly, however, the bloom filters were being written to the output
one byte at a time (with quite a few type-conversions on the way) to transform the little-endian
in-memory representation to the big-endian on-disk representation.

I have implemented a solution where big-endian is used in-memory as well as on-disk, which
obviously makes de-/serialization much, much faster. This introduces some slight overhead
when checking the bloom filter, but I can't see how that would be problematic. An obvious
alternative would be to still perform the serialization/deserialization using a byte array,
but perform the byte-order swap there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message