cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Kjellman (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions
Date Sat, 01 Oct 2016 18:21:21 GMT


Michael Kjellman commented on CASSANDRA-9754:

Oh, another very important update. Originally, I was mmapping 4kb aligned chunks as necessary.
When I finally got things stable due to a few file descriptor leaks and fun fighting Java
with MemoryByteBuffer objects I ran the performance load from the stress tool I wrote and
found the performance was randomly *terrible* (like 1.3 SECONDS in the 99.9th percentile).
Upon investigation and a ton instrumentation I found mmap calls were taking *90+ms* in the
99th percentile and *70+ms* in the 90th percentile on the hardware I'm using for performance
testing. I looked into the JDK source code to figure out if there were any synchronized blocks
in the native code but it's pretty sane and just calls the mmap syscall. Discussed it a bit
with Norman Maurer and we both came up pretty shocked that mmap could be that slow! These
boxes have 256GB of RAM and there was basically zero disk IO as everything was in the page
cache as expected. There were a lot of major page faults but really very very surprising mmap
can be so horrible in the upper percentiles.

I ripped out all the mmap logic on the read path and switched to directly reading from the
RAF from the aligned 4kb chunks as needed and everything looked amazing.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>                 Key: CASSANDRA-9754
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>         Attachments: 9754_part1-v1.diff, 9754_part2-v1.diff
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects are IndexInfo
and its ByteBuffers. This is specially bad in endpoints with large CQL partitions. If a CQL
partition is say 6,4GB, it will have 100K IndexInfo objects and 200K ByteBuffers. This will
create a lot of churn for GC. Can this be improved by not creating so many objects?

This message was sent by Atlassian JIRA

View raw message