cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Roth (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12689) All MutationStage threads blocked, kills server
Date Sat, 24 Sep 2016 16:44:20 GMT


Benjamin Roth commented on CASSANDRA-12689:

I guess I found the cause of the problem (at least I'm pretty sure).
There is a race condition in calling Keyspace.apply in a blocking manner from Mutation.apply.
See Mutation line 227, Uninterruptibles.getUninterruptibly(applyFuture(durableWrites))

When this is called AND the lock for MV update could not be aquired, THEN the apply is being
deferred on MutationStage queue and Mutation.apply is waiting for this deferred task to finish,
right? So this Thread (MutationStageWorker) is blocked until the deferred future is completed
and cannot process any other tasks in the mutation queue.
But what if all mutation workers are currently busy and in the same situation? Then the deferred
tasks will never be processed, the futures will never complete and all workers are waiting
for their futures to be completed which will never happen => Complete DEADLOCK.

More abstract: A blocking call in any stage MUST NEVER defer itself on its own stage.
Simple example: Imagine a queue with 1 worker. That worker is processing a task of this queue.
This task enqueues another task on the same queue and wait for it to finish. It never will,
as there is only one worker and this one is now blocked.

Possible solutions:
1. complete future before defer. Would resolve that special issue but that would mean "fire
and forget" and that is not what futures are made for
2. Do not block in Mutation.apply or use Mutation.applyFuture in critical situations - probably
fine solution but harder to implement and big impact on existing code
3. My personally preferred option: Introduce "deferrable" flag in Keyspace.apply and set it
to false when called from a blocking context. If true then dont defer current apply but retry
in loop until success or writeTimout is reached, maybe with a small sleep time depending on
writeTimeout (e.g. writeTimeout / 100).

Apart from all that:
If a caller is waiting (blocking) for a future to finish it absolutely makes no sense to defer
it to be processed by another thread in the future for not to block the current thread (see
comment on line 492 in Keyspace). The caller thread is blocked anyway by waiting for the future
to complete.

Does someone agree?

> All MutationStage threads blocked, kills server
> -----------------------------------------------
>                 Key: CASSANDRA-12689
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Benjamin Roth
>            Priority: Critical
> Under heavy load (e.g. due to repair during normal operations), a lot of NullPointerExceptions
occur in MutationStage. Unfortunately, the log is not very chatty, trace is missing:
> 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService
Uncaught exception on thread Thread[MutationStage-1,5,main]: {}
> 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null
> Then, after some time, in most cases ALL threads in MutationStage pools are completely
blocked. This leads to piling up pending tasks until server runs OOM and is completely unresponsive
due to GC. Threads will NEVER unblock until server restart. Even if load goes completely down,
all hints are paused, and no compaction or repair is running. Only restart helps.
> I can understand that pending tasks in MutationStage may pile up under heavy load, but
tasks should be processed and dequeud after load goes down. This is definitively not the case.
This looks more like a an unhandled exception leading to a stuck lock.
> Stack trace from jconsole, all Threads in MutationStage show same trace.
> Name: MutationStage-48
> State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266
> Total blocked: 137  Total waited: 138.513
> Stack trace: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(
> java.util.concurrent.CompletableFuture$Signaller.block(
> java.util.concurrent.ForkJoinPool.managedBlock(
> java.util.concurrent.CompletableFuture.waitingGet(
> java.util.concurrent.CompletableFuture.get(
> org.apache.cassandra.db.Mutation.apply(
> org.apache.cassandra.db.Mutation.apply(
> org.apache.cassandra.hints.Hint.apply(
> org.apache.cassandra.hints.HintVerbHandler.doVerb(
> java.util.concurrent.Executors$
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$

This message was sent by Atlassian JIRA

View raw message