incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: Alternative Row Cache Implementation
Date Fri, 01 Jul 2011 00:41:36 GMT
We had a visitor from Intel a month ago.

One question from him was "What could you do if we gave you a server 2 years
from now that had 16TB of memory"

I went.... Eh... using Java....?

2 years is maybe unrealistic, but you can already get some quite acceptable
prices even on servers in the 100GB memory range now if you buy in larger
quantities (30-50 servers and more in one go).

I don't think it is unrealistic that we will start seeing high end consumer
(x64) servers with TB's of memory in a few years and I really wonder were
that puts java based software.

Terje

On Fri, Jul 1, 2011 at 2:25 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

>
>
> On Thu, Jun 30, 2011 at 12:44 PM, Daniel Doubleday <
> daniel.doubleday@gmx.net> wrote:
>
>> Hi all - or rather devs
>>
>> we have been working on an alternative implementation to the existing row
>> cache(s)
>>
>> We have 2 main goals:
>>
>> - Decrease memory -> get more rows in the cache without suffering a huge
>> performance penalty
>> - Reduce gc pressure
>>
>> This sounds a lot like we should be using the new serializing cache in
>> 0.8.
>> Unfortunately our workload consists of loads of updates which would
>> invalidate the cache all the time.
>>
>> The second unfortunate thing is that the idea we came up with doesn't fit
>> the new cache provider api...
>>
>> It looks like this:
>>
>> Like the serializing cache we basically only cache the serialized byte
>> buffer. we don't serialize the bloom filter and try to do some other minor
>> compression tricks (var ints etc not done yet). The main difference is that
>> we don't deserialize but use the normal sstable iterators and filters as in
>> the regular uncached case.
>>
>> So the read path looks like this:
>>
>> return filter.collectCollatedColumns(memtable iter, cached row iter)
>>
>> The write path is not affected. It does not update the cache
>>
>> During flush we merge all memtable updates with the cached rows.
>>
>> These are early test results:
>>
>> - Depending on row width and value size the serialized cache takes between
>> 30% - 50% of memory compared with cached CF. This might be optimized further
>> - Read times increase by 5 - 10%
>>
>> We haven't tested the effects on gc but hope that we will see improvements
>> there because we only cache a fraction of objects (in terms of numbers) in
>> old gen heap which should make gc cheaper. Of course there's also the option
>> to use native mem like serializing cache does.
>>
>> We believe that this approach is quite promising but as I said it is not
>> compatible with the current cache api.
>>
>> So my question is: does that sound interesting enough to open a jira or
>> has that idea already been considered and rejected for some reason?
>>
>> Cheers,
>> Daniel
>>
>
>
>
> The problem I see with the row cache implementation is more of a JVM
> problem. This problem is not Cassandra localized (IMHO) as I hear Hbase
> people with similar large cache/ Xmx issues. Personally, I feel this is a
> sign of Java showing age. "Let us worry about the pointers" was a great
> solution when systems had 32MB memory, because the cost of walking the
> object graph was small and possible and small time windows. But JVM's
> already can not handle 13+ GB of RAM and it is quite common to see systems
> with 32-64GB physical memory. I am very curious to see how java is going to
> evolve on systems with 128GB or even higher memory.
>
> The G1 collector will help somewhat, however I do not see that really
> pushing Xmx higher then it is now. HBase has even went the route of using an
> off heap cache, https://issues.apache.org/jira/browse/HBASE-4018 , and
> some Jira mentions Cassandra exploring this alternative as well.
>
> Doing whatever possible to shrink the current size of item in cache is an
> awesome. Anything that delivers more bang for the buck is +1. However I feel
> that VFS cache is the only way to effectively cache large datasets. I was
> quite disappointed when I upped a machine from 16GB to 48 GB physical
> memory. I said to myself "Awesome! now I can shave off a couple of GB for
> larger row caches" I changed Xmx from 9GB to 13GB, upped the caches, and
> restarted. I found the system spending a lot of time managing heap, and also
> found that my compaction processes that did 200GB in 4 hours now were taking
> 6 or 8 hours.
>
> I had heard that JVMs "top out around 20GB" but I found they "top out" much
> lower. VFS cache +1
>
>
>

Mime
View raw message