cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5906) Avoid allocating over-large bloom filters
Date Fri, 22 Nov 2013 18:40:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830212#comment-13830212
] 

Yuki Morishita commented on CASSANDRA-5906:
-------------------------------------------

So far, I tested HLL++ alone for serialized size and error% with various parameters. 
https://docs.google.com/a/datastax.com/spreadsheet/ccc?key=0AsVe14L_ijtkdEhDbk1rTjYwb3ZjdXFlTnNCNnk2cGc#gid=13

We can reduce the size from originally posted here (p=16, sp=0), down to less than 10k for
p=13, sp=25. Using the sparse mode, we can save space for smaller number of partitions.
I think relative error 2% of estimated partition size is tolerable for constructing bloom
filter. (though I don't have formula to prove it :P)


> Avoid allocating over-large bloom filters
> -----------------------------------------
>
>                 Key: CASSANDRA-5906
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5906
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>             Fix For: 2.1
>
>
> We conservatively estimate the number of partitions post-compaction to be the total number
of partitions pre-compaction.  That is, we assume the worst-case scenario of no partition
overlap at all.
> This can result in substantial memory wasted in sstables resulting from highly overlapping
compactions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message