ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey N. Gura (Jira)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-12523) Continuously generated thread dumps in failure processor slow down the whole system
Date Thu, 09 Jan 2020 15:14:00 GMT
Andrey N. Gura created IGNITE-12523:
---------------------------------------

             Summary: Continuously generated thread dumps in failure processor slow down the
whole system
                 Key: IGNITE-12523
                 URL: https://issues.apache.org/jira/browse/IGNITE-12523
             Project: Ignite
          Issue Type: Improvement
            Reporter: Andrey N. Gura
            Assignee: Andrey N. Gura
             Fix For: 2.9


A lot of threads (hundreds) build indexes. checkpoint-thread tries acquire write lock but
can’t because some threads hold read lock. Moreover, some threads try to acquire read lock
too. Failure types SYSTEM_WORKER_BLOCKED and SYSTEM_CRITICAL_OPERATION_TIMEOUT are ignored.

checkpoint-thread treated as blocked critical system worker. So failure processor gets thread
dump. 

Threads  that waiting on read lock reports about SYSTEM_CRITICAL_OPERATION_TIMEOUT and also
get thread dump.

Thread dump generation takes from 500 to 1000 ms.

All this activity leads to stop-the-world pause and triggers other timeouts. It could take
long time because many threads are active and half time is thread dump generation.

Root cause problem here is checkpoint read-write lock. Discussed with [~agoncharuk]Alexey
Goncharuk and it seems only implementation of fuzzy checkpoint could solve the problem. But
it requires big effort.

*Solution*

Andrey Gura
December 20, 2019, 3:18 PM
Edited

Final solution and implementation:


- New system property IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT added.  Default value
is failure detection timeout.

- Each call of FailureProcessor#process(FailureContext, FailureHandler) method checka throttling
timeout before thread dump generation.

- There is no need to check that failure type is ignored. Throttling will be useful for all
cases when context is not invalidated (FailureProcessor.failureCtx != null).

 - For throttled thread dump we log info message  “Thread dump is hidden due to throttling
settings. Set IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT property to 0 to see all thread
dumps".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message