incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@lipcon.org>
Subject Re: cassandra vs hbase summary (was facebook messaging)
Date Tue, 23 Nov 2010 01:25:11 GMT
On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> @Todd. Good catch about caching HFile blocks.
>
> My point still applies though. Caching HFIle blocks on a single node
> vs individual "dataums" on N nodes may not be more efficient. Thus
> terms like "Slower" and "Less Efficient" could be very misleading.
>
> Isn't caching only the item more efficient? In cases with high random
> read is evicting single keys more efficient then evicting blocks in
> terms of memory churn?
>
> These are difficult questions to answer absolutely so seeing bullet
> points such as '#Cassandra has slower this' are oversimplifications of
> complex problems.
>

Definitely complex, especially in a system like Java where memory accounting
is often difficult to quantify. Depending on the data structure used for
your cache, you are likely to have at least 8-16 bytes of overhead per item
in the data structure, more likely much much more. EG we calculate the
following overhead for our cache:

    CONCURRENT_HASHMAP_ENTRY = align(REFERENCE + OBJECT + (3 * REFERENCE) +
        (2 * Bytes.SIZEOF_INT));
which ends up being something like 48 bytes per entry on a 64-bit JVM.

So, if your rows are small (eg 64 bytes), caching a 64KB block with 1000
entries and 64 bytes of overhead is much more RAM-efficient than caching
1000 64-byte rows with 48KB of overhead.

I agree, of course, that absolutisms are way oversimplified. Discussions
like these that elicit the differences between the systems are productive,
though - I think each system can learn things from the other.

-Todd

Mime
View raw message