asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingyi Bu (JIRA)" <>
Subject [jira] [Created] (ASTERIXDB-1076) False failures triggers denying new queries
Date Tue, 18 Aug 2015 17:55:45 GMT
Yingyi Bu created ASTERIXDB-1076:

             Summary: False failures triggers denying new queries
                 Key: ASTERIXDB-1076
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: AsterixDB
            Reporter: Yingyi Bu
            Priority: Critical

When CPUs in the cluster are saturated for computations,  the heartbeat from slave nodes to
the master node might get delayed.  In this case, the master node thinks a node fails, and
can no longer adds the node back.  Hence, the entire cluster is not usable and an instance
restart is needed.

Two things need to be fixed:
1.  (at least) expose AsterixDB configuration parameters to allow users to set a large heartbeat
2.  allow a node to leave and re-join a hyracks cluster.

In the long term, we might need to investigate better liveness check strategies.

To reproduce that issue,  just let slave nodes' CPUs overloaded and you will see that.

This message was sent by Atlassian JIRA

View raw message