cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
Date Tue, 25 Nov 2014 17:50:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224904#comment-14224904
] 

Vijay commented on CASSANDRA-7438:
----------------------------------

{quote}
sun.misc.Hashing doesn't seem to exist for me, maybe a Java 8 issue?
StatsHolder, same AtomicLongArray suggestion. Also consider LongAdder.
{quote}
Yep, and let me find alternatives for Java 8 (and until 8 for LongAdder).
{quote}
The queue really needs to be bounded, producer and consumer could proceed at different rates.
In Segment.java in the replace path AtomicLong.addAndGet is called back to back, could be
called once with the math already done. I believe each of those stalls processing until the
store buffers have flushed. The put path does something similar and could have the same optimization.
{quote}
Yeah those where a oversight.
{quote}
Tasks submitted to executor services via submit will wrap the result including exceptions
in a future which silently discards them. 
The library might take at initialization time a listener for these errors, or if it is going
to be C* specific it could use the wrapped runnable or similar.
{quote}
Are you suggesting a configurable logging/exception handling in case the 2 threads throw exceptions?
If yes sure. Other exceptions AFAIK are already propagated. (Still needs cleanup though).
{quote}
A lot of locking that was spin locking (which unbounded I don't think is great) is now blocking
locking. There is no adaptive spinning if you don't use synchronized. If you are already using
unsafe maybe you could do monitor enter/exit. Never tried it.
Having the table (segments) on heap is pretty undesirable to me. Happy to be proved wrong,
but I think a flyweight over off heap would be better.
{quote}
Segments are small in memory so far in my tests, The spin lock is to make sure the lock checks
the segment if rehash happened or not, this is better than having a seperate lock which will
be central. (No different than java or memcached).
Not sure if i understand the UNSAFE lock any example will help. 
The segments are in heap mainly to handle the locking, I think we can do a bit of CAS but
global lock on rehashing will be a problem (May be an alternate approach is required).
{quote}
It looks like concurrent calls to rehash could cause the table to rehash twice since the rebalance
field is not CASed. You should do the volatile read, and then attempt the CAS (avoids putting
the cache line in exclusive state every time).
{quote}
Nope it is Single threaded Executor and the rehash boolean is already volatile :)
Next commit will have conditions instead (similar to C implementation).
{quote}
If the expiration lock is already locked some other thread is doing the expiration work. You
might keep a semaphore for puts that bypass the lock so other threads can move on during expiration.
I suppose after the first few evictions new puts will move on anyways. This would show up
in a profiler if it were happening.
{quote}
Good point… Or a tryLock to spin and check if some other thread released enough memory.
{quote}
hotN looks like it could lock for quite a while (hundreds of milliseconds, seconds) depending
on the size of N. You don't need to use a linked list for the result just allocate an array
list of size N. Maybe hotN should be able to yield, possibly leaving behind an iterator that
evictors will have to repair. Maybe also depends on how top N handles duplicate or multiple
versions of keys. Alternatively hotN could take a read lock, and writers could skip the cache?
{quote}
We cannot have duplicates in the Queue (remember it is a double linked list of items in cache).
Read locks q_expiry_lock is all we need, let me fix it.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch
>
>
> Currently SerializingCache is partially off heap, keys are still stored in JVM heap as
BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better results, but
this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off heap and
use JNI to interact with cache. We might want to ensure that the new implementation match
the existing API's (ICache), and the implementation needs to have safe memory access, low
overhead in memory and less memcpy's (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message