asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingyi Bu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ASTERIXDB-1076) False failures triggers denying new queries
Date Tue, 18 Aug 2015 17:55:45 GMT
Yingyi Bu created ASTERIXDB-1076:
------------------------------------

             Summary: False failures triggers denying new queries
                 Key: ASTERIXDB-1076
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1076
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: AsterixDB
            Reporter: Yingyi Bu
            Priority: Critical


When CPUs in the cluster are saturated for computations,  the heartbeat from slave nodes to
the master node might get delayed.  In this case, the master node thinks a node fails, and
can no longer adds the node back.  Hence, the entire cluster is not usable and an instance
restart is needed.

Two things need to be fixed:
1.  (at least) expose AsterixDB configuration parameters to allow users to set a large heartbeat
threshold;
2.  allow a node to leave and re-join a hyracks cluster.

In the long term, we might need to investigate better liveness check strategies.


To reproduce that issue,  just let slave nodes' CPUs overloaded and you will see that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message