cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Mon, 12 Jan 2015 23:39:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274409#comment-14274409
] 

Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

I did another review. The additional test coverage looks great.

Don’t throw Error, throw runtime exceptions on things like serialization issues. The only
place it make sense to throw error is when allocating memory fails. That would match the behavior
of ByteBuffer.allocateDirect. I don’t see failure to allocate from the heap allocator as
recoverable even in the context of the cache. IOError is thrown from one place in the entire
JDK (Console) so it's an odd choice.

freeCapacity should reall be a field inside each segment and full/not full and eviction decisions
should be made inside each segment independently. In practice inside C* it’s probably fine
as just an AtomicLong, but I want to see OHC be all it can be.

Rehash test could validate the data. After the rehash. It could also validate the rehash under
concurrent access, say have a  reader thread that is randomly accessing values < the already
inserted value.I can’t tell if the crosscheck test inserts enough values to trigger rehashing.

Inlining the murmur3 changes makes me a little uncomfortable. It’s good see see some test
coverage comparing with another implementation, but it’s over a small set of data. It seems
like the Unsigned stuff necessary to perfectly mimic the native version of murmur3 is missing?

Add 2-4 byte coed points for the UTF-8 tests.

FastUtil is a 17 megabyte dependency all to get one array list.

The cross checking implementation is really nice.

Looking at the AbstractKeyIterator, I don’t see how it can do the right thing when a segment
rehashes. It will point to a random spot in the segment after a rehash right? In practice
maybe this doesn’t matter since they should size up promptly and it’s just an optimization
that we dump this stuff at all. I can understand what the current code does so I lean towards
keeping it.

There are a couple of places (serializeForPut, putInternal, maybe others) where there are
two exception handlers that each de-allocate the same piece of memory. The deallocation could
go in a finally instead of the exception handlers since it always happens.


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Robert Stupp
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message