incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin <colpcl...@gmail.com>
Subject Re: Fetching ONE cell with a row cache hit takes 1 second on an idle box?
Date Wed, 02 Jul 2014 03:45:57 GMT
Rowcache is typically turned off because it is only useful in very specific situations-the
row(s) need to fit in memory.  Also, the access patterns have to fit.

If all the rows you're accessing can fit, Rowcache is a great thing. Otherwise, not so great.

--
Colin
320-221-9531


> On Jul 1, 2014, at 10:40 PM, Kevin Burton <burton@spinn3r.com> wrote:
> 
> WOW.. so based on your advice, and a test, I disabled the row cache for the table.
> 
> The query was instantly 20x faster.
> 
> so this is definitely an anti-pattern.. I suspect cassandra just tries to read in they
entire physical row into memory and since my physical row is rather big.. ha.  Well that wasn't
very fun :)
> 
> BIG win though ;)
> 
> 
>> On Tue, Jul 1, 2014 at 6:52 PM, Kevin Burton <burton@spinn3r.com> wrote:
>> A work around for this, is the VFS page cache.. basically, disabling compression,
and then allowing the VFS page cache to keep your data in memory.
>> 
>> The only downside is the per column overhead.  But if you can store everything in
a 'blob' which is optionally compressed, you're generally going to be ok.
>> 
>> Kevin
>> 
>> 
>>> On Tue, Jul 1, 2014 at 6:50 PM, Kevin Burton <burton@spinn3r.com> wrote:
>>> so.. caching the *queries* ?
>>> 
>>> it seems like a better mechanism would be to cache the actually logical row…,
not the physical row.  
>>> 
>>> Query caches just don't work in production,  If you re-word your query, or structure
it a different way, you get a miss…
>>> 
>>> In my experience.. query caches have a 0% hit rate.
>>> 
>>> 
>>>> On Tue, Jul 1, 2014 at 6:40 PM, Robert Coli <rcoli@eventbrite.com>
wrote:
>>>>> On Tue, Jul 1, 2014 at 6:06 PM, Kevin Burton <burton@spinn3r.com>
wrote:
>>>>> you know.. one thing I failed to mention.. .is that this is going into
a "bucket" and while it's a logical row, the physical row is like 500MB … according to compaction
logs.
>>>>> 
>>>>> is the ENTIRE physical row going into the cache as one unit?  That's
definitely going to be a problem in this model.  500MB is a big atomic unit.
>>>> 
>>>> Yes, the row cache is a row cache. It caches what the storage engine calls
rows, which CQL calls "partitions." [1] Rows have to be assembled from all of their row fragments
in Memtables/SSTables.
>>>> 
>>>> This is a big part of why the "off-heap" row cache's behavior of invalidation
on write is so bad for its overall performance. Updating a single column in your 500MB row
invalidates it and forces you to assemble the entire 500MB row from disk. 
>>>> 
>>>> The only valid use case for the current off-heap row cache seems to be :
very small, very uniform in size, very hot, and very rarely modified.
>>>> 
>>>> https://issues.apache.org/jira/browse/CASSANDRA-5357
>>>> 
>>>> Is the ticket for replacing the row cache and its unexpected characteristics
with something more like an actual query cache.
>>>> 
>>>>> also.. I assume it's having to do a binary search within the physical
row ? 
>>>> 
>>>> Since the column level bloom filter's removal in 1.2, the only way it can
get to specific columns is via the index.
>>>> 
>>>> =Rob
>>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-6632
>>> 
>>> 
>>> 
>>> -- 
>>> Founder/CEO Spinn3r.com
>>> Location: San Francisco, CA
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>> 
>> 
>> 
>> -- 
>> Founder/CEO Spinn3r.com
>> Location: San Francisco, CA
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
> 
> 
> 
> -- 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 

Mime
View raw message