cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Tue, 02 Dec 2014 07:49:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231132#comment-14231132
] 

Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------

[~aweisberg], thanks for the review :)

Some of the changes you suggested are already in. Much of the code has been moved to OffHeapMap.
Batch cleanup has been completely removed - it's now handled inside OffHeapMap.
It makes runtime and code much nicer.

I've delayed descent unit tests to later - the stuff changed to often.
And there would be a big change when merging the stuff into C* code base, removing all that
unused code and Cache interface implementation.

All of the duplicated stuff has been removed - we don't need that - even for a general-purpose
cache it would have been not useful.

bq. Key buffer might be worth a thread local sized to a high watermark
Hm - do you mean sth like {{static final ThreadLocal<KeyBuffer> perThreadBuffer;}} inside
SegmentedCacheImpl?

Regarding the "line level code review" (it's fine the way you did it IMO):

bq. Don't catch exceptions
Done.

bq. 8 bytes at a time, how does it handle trailing bytes and alignment?
Tailing bytes: fails back to per-byte comparison. Alignment: key and value are aligned on
8-byte boundaries.

bq. Agrona has an Unsafe ByteBuffer implementation that looks like it makes a little better
use of various intrinsics then AbstractDataOutput.
Good hint! Will definitely take a look at it!

bq. I don't see why the cache can't use CPU endianness since the key/value are just copied.
Ah - you mean that stuff in HashEntryInput/Output. No - you can't always copy it using unsafe
API.
I don't recall exactly why I removed that optimization (had that implemented before), but
it had sth to do with data serialized for KeyBuffer and putting it into off-heap.
But it makes sense for values (since these are always directly serialized to off-heap).

bq. UTF encoded string stuff ... get that via inheritance?
Yes, basically from JDK. Could not get that via inheritance.

bq. hashing the key yourself ... 32-bits
Thought about it (and had that previously). Yes - if we have a good hash code, we can use
it.
But I don't know whether the calling code has a hash code.
IMO hash code should be 64 bits because 32 bits might not be sufficient.

bq. allocation fails maybe throw OutOfMemoryError
That would shut down C* daemon ;) Maybe. Not sure about that.
I think if you run into such a situation (out of off-heap/system memory) you are completely
lost.
It just ignores that put() and removes the old entry.

bq. entry is too large maybe return an error of some sort
No. The calling code cannot do anything meaningful with it. But the calling could could check
for that in advance (before constructing any object related to caching), if it has enough
information.

bq. catch VirtualMachineError and not error
done

bq. hotN()
I _think_ it is used to persist the hot set of the cache.

bq. concerned about materializing the full list on heap
Agree. Thought about "patching" cache off-heap addresses into DirectByteBuffer and using that
for serialization.

bq. I don't think storage for 16 million keys is enough?
Nope - would not be. But it's 2^27 (limited by a stupid constant used for both max# of segments
and max# of buckets). Worth taking a look at it - it's weird, yes.

bq. value loading case,
Don't think we need that API.

bq. Release is not synchronized.
Yep - will do that.

(Hope I caught all of your comments)

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message