cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Hu ...@carlhu.com>
Subject Re: GC pauses affecting entire cluster.
Date Tue, 02 Jun 2015 02:05:27 GMT
Anuj,

So I did the experiment with the default gc settings but using
memtable_allocation_type
with offheap_objects: cassandra still freezes once every two hours or so,
locking up the cluster. I will try your settings tomorrow and report back.

Let me know if anyone else has any suggestions,
Carl


On Mon, Jun 1, 2015 at 4:00 PM, Carl Hu <me@carlhu.com> wrote:

> Thank you for the suggestion. After analysis of your settings, the basic
> hypothesis here is to promote very quickly to Old Gen because of a rapid
> accumulation of heap usage due to memtables. We happen to be running on
> 2.1, and I thought a more conservative approach that your (quite aggressive
> gc settings) is to try the new memtable_allocation_type with
> offheap_objects and see if the memtable pressure is relieved sufficiently
> such that the standard gc settings can keep up.
>
> The experiment is in progress and I will report back with the results.
>
> On Mon, Jun 1, 2015 at 10:20 AM, Anuj Wadehra <anujw_2003@yahoo.co.in>
> wrote:
>
>> We have write heavy workload and used to face promotion failures/long gc
>> pauses with Cassandra 2.0.x. I am not into code yet but I think that
>> memtable and compaction related objects have mid-life and write heavy
>> workload is not suitable for generation collection by default. So, we tuned
>> JVM to make sure that minimum objects are promoted to Old Gen and achieved
>> great success in that:
>> MAX_HEAP_SIZE="12G"
>> HEAP_NEWSIZE="3G"
>> -XX:SurvivorRatio=2
>> -XX:MaxTenuringThreshold=20
>> -XX:CMSInitiatingOccupancyFraction=70
>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=20"
>> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>> JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=30000"
>> JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=2000"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSEdenChunksRecordAlways"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled"
>> JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
>> We also think that default total_memtable_space_in_mb=1/4 heap is too
>> much for write heavy loads. By default, young gen is also 1/4 heap.We
>> reduced it to 1000mb in order to make sure that memtable related objects
>> dont stay in memory for too long. Combining this with SurvivorRatio=2 and
>> MaxTenuringThreshold=20 did the job well. GC was very consistent. No Full
>> GC observed.
>>
>> Environment: 3 node cluster with each node having 24cores,64G RAM and
>> SSDs in RAID5.
>> We are making around 12k writes/sec in 5 cf (one with 4 sec index) and
>> 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with
>> max data of around 100mb per row
>>
>> Yes. Node marking down has cascading effect. Within seconds all nodes in
>> our cluster are marked down.
>>
>> Thanks
>> Anuj Wadehra
>>
>>
>>
>>   On Monday, 1 June 2015 7:12 PM, Carl Hu <me@carlhu.com> wrote:
>>
>>
>> We are running Cassandra version 2.1.5.469 on 15 nodes and are
>> experiencing a problem where the entire cluster slows down for 2.5 minutes
>> when one node experiences a 17 second stop-the-world gc. These gc's happen
>> once every 2 hours. I did find a ticket that seems related to this:
>> https://issues.apache.org/jira/browse/CASSANDRA-3853, but Jonathan Ellis
>> has resolved this ticket.
>>
>> We are running standard gc settings, but this ticket is not so much
>> concerned with the 17 second gc on a single node (after all, we have 14
>> others), but that the cascading performance problem.
>>
>> We running standard values of dynamic_snitch_badness_threshold (0.1) and
>> phi_convict_threshold (8). (These values are relevant for the dynamic
>> snitch routing requests away from the frozen node or the failure detector
>> marking the node as 'down').
>>
>> We use the python client in default round robin mode, so all clients hits
>> the coordinators at all nodes in round robin. One theory is that since the
>> coordinator on all nodes must hit the frozen node at some point in the 17
>> seconds, each node's request queues fills up, and the entire cluster thus
>> freezes up. That would explain a 17 second freeze but would not explain the
>> 2.5 minute slowdown (10x increase in request latency @P50).
>>
>> I'd love your thoughts. I've provided the GC chart here.
>>
>> Carl
>>
>> [image: Inline image 1]
>>
>>
>>
>

Mime
View raw message