cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Mon, 12 Jan 2015 19:28:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274005#comment-14274005
] 

Ariel Weisberg edited comment on CASSANDRA-7438 at 1/12/15 7:28 PM:
--------------------------------------------------------------------

If you go all the way down the JMH rabbit hole you don't need to do any of your own timing
and JMH will actually do some smart things to give you accurate timing and ameliorate the
impact of non-scalable/expensive timing measurement. Metrics uses System.nanoTime() internally
so it isn't really any better as far as I can tell. System.nanoTime() on Linux is pretty scalable
http://shipilev.net/blog/2014/nanotrusting-nanotime/. When I tested it in JMH it actually
seemed to be linearly scalable, but JMH will solve that for you even on platforms where nanoTime
is finicky.

The C* integration looks good. I'm glad it was easy. When it comes to exposing configuration
parameters less is more. I would prefer not to expose anything new because once people start
using them they don't like to have the options taken away (or disabled). We should make an
effort to set them automatically (or good enough) and if that fails we can add user visible
configuration. My preference is to make the options accessible via properties as an escape
hatch in production, and then add them to config if we really can't set them automatically.

Can you prefix any System properties you have with a classname/package or something that makes
it clear they are part of OHC?

The stress tool when used without workload profiles does some validation. It checks that values
are there and that the contents are correct.

Did not know about the JNA synchronized block. That is surprising, but I am glad to hear it
is getting fixed. For access to jemalloc I recommend using unsafe and LD_PRELOAD jemalloc.
I think that would be the recommended approach and the one you should benchmark against and
JNA would be there as a fallback. That gives you a JNI call for allocation/deallocation.

I am trying out the JMH benchmark and looking at the new linked implementation right now.
How are you starting the JMH benchmark?



was (Author: aweisberg):
If you go all the way down the JMH rabbit hole you don't need to do any of your own timing
and JMH will actually do some smart things to give you accurate timing and ameliorate the
impact of non-scalable/expensive timing measurement. Metrics uses System.nanoTime() internally
so it isn't really any better as far as I can tell. System.nanoTime() on Linux is pretty scalable
http://shipilev.net/blog/2014/nanotrusting-nanotime/. When I tested it in JMH it actually
seemed to be linearly scalable, but JMH will solve that for you even on platforms where nanoTime
is finicky.

The C* integration looks good. I'm glad it was easy. When it comes to exposing configuration
parameters less is more. I would prefer not to expose anything new because once people start
using them they don't like to have the options taken away (or disabled). We should make an
effort to set them automatically (or good enough) and if that fails we can add user visible
configuration. My preference is to make the options accessible via properties as an escape
hatch in production, and then add them to config if we really can't set them automatically.

The stress tool when used without workload profiles does some validation. It checks that values
are there and that the contents are correct.

Did not know about the JNA synchronized block. That is surprising, but I am glad to hear it
is getting fixed. For access to jemalloc I recommend using unsafe and LD_PRELOAD jemalloc.
I think that would be the recommended approach and the one you should benchmark against and
JNA would be there as a fallback. That gives you a JNI call for allocation/deallocation.

I am trying out the JMH benchmark and looking at the new linked implementation right now.
How are you starting the JMH benchmark?


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Robert Stupp
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message