cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5906) Avoid allocating over-large bloom filters
Date Thu, 19 Sep 2013 12:36:53 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771836#comment-13771836
] 

Jonathan Ellis edited comment on CASSANDRA-5906 at 9/19/13 12:35 PM:
---------------------------------------------------------------------

bq. Since ByteBuffer's hashCode is only a function of the number of bits remaining we cannot
use it directly in the offer function.

I don't follow -- that should be exactly the desired behavior.  The ByteBuffer offset/remaining
are telling us, "this is the part of the backing array that we're interested in," which lets
us "split up" regions of memory without having to actually copy to new arrays.  So BBU.getArray
is only for when some API only allows arrays and possibly having to perform a copy is the
only alternative:

{code}
/**
     * You should almost never use this.  Instead, use the write* methods to avoid copies.
     */
    public static byte[] getArray(ByteBuffer buffer)
    {
        int length = buffer.remaining();

        if (buffer.hasArray())
        {
            int boff = buffer.arrayOffset() + buffer.position();
            if (boff == 0 && length == buffer.array().length)
                return buffer.array();
            else
                return Arrays.copyOfRange(buffer.array(), boff, boff + length);
        }
        // else, DirectByteBuffer.get() is the fastest route
        byte[] bytes = new byte[length];
        buffer.duplicate().get(bytes);

        return bytes;
    }
{code}

bq. The size of the HLL is a function of how precise you need it to be. If we use a p of 15
instead of 16 the size drops to 21K. Inserting the same 500K elements into a HLL+ with p=15
yields of .58% in my tests.

So, we can trade a factor of 2 size for roughly a factor of 2 precision?.  Unless we have
a use for keeping these on heap that I can't think of, I'd say we should double the size and
only read them in for compaction.
                
      was (Author: jbellis):
    bq. Since ByteBuffer's hashCode is only a function of the number of bits remaining we
cannot use it directly in the offer function.

I don't follow -- that should be exactly the desired behavior.  The ByteBuffer offset/remaining
are telling us, "this is the part of the backing array that we're interested in," which lets
us "split up" regions of memory without having to actually copy to new arrays.

bq. The size of the HLL is a function of how precise you need it to be. If we use a p of 15
instead of 16 the size drops to 21K. Inserting the same 500K elements into a HLL+ with p=15
yields of .58% in my tests.

So, we can trade a factor of 2 size for roughly a factor of 2 precision?.  Unless we have
a use for keeping these on heap that I can't think of, I'd say we should double the size and
only read them in for compaction.
                  
> Avoid allocating over-large bloom filters
> -----------------------------------------
>
>                 Key: CASSANDRA-5906
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5906
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>             Fix For: 2.0.1
>
>
> We conservatively estimate the number of partitions post-compaction to be the total number
of partitions pre-compaction.  That is, we assume the worst-case scenario of no partition
overlap at all.
> This can result in substantial memory wasted in sstables resulting from highly overlapping
compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message