ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Semen Boikov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-4798) Cluster does not finish rebalancing after nodes leaving
Date Tue, 07 Mar 2017 14:01:38 GMT
Semen Boikov created IGNITE-4798:
------------------------------------

             Summary: Cluster does not finish rebalancing after nodes leaving
                 Key: IGNITE-4798
                 URL: https://issues.apache.org/jira/browse/IGNITE-4798
             Project: Ignite
          Issue Type: Bug
            Reporter: Denis Kholodov


   
Hi Valentin,

I managed to reproduce the stability issue we've been having in production in a relatively
sterile environment.
The logs and stack traces are accessible here: https://drive.google.com/open?id=0B1YMrCiHZq1PMWJsblBYSXhaX1k

The situation is:
1. Startup a cluster of 223 nodes.
2. Wait for everything to stabilize (took about 2 minutes).
3. Shut down 112 nodes.
4. Wait for everything to stabilize..

Since that point, I can't connect client nodes to the cluster:
2017-02-15 23:13:16.396 WARN  o.a.i.i.p.c.GridCachePartitionExchangeManager main         
       ctx:             actor:             - Failed to wait for initial partition map exchange.
Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.

Other cache operations are also stuck.

Let me know what other information I can provide.

 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message