accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-2561) Make o.a.a.server.util.Halt test-friendly
Date Wed, 02 Apr 2014 14:29:21 GMT


Eric Newton commented on ACCUMULO-2561:

{{Halt}} could be a replaceable singleton.

The decision to call halt is due to some catastrophic failure:

 * loss of distributed lock (as described in the BigTable paper)
 * detection of high GC overhead, which precedes distributed lock failure
 * bad system password detected (defence of mis-configured cluster)
 * failure to open/use a communications port
 * filesystem transitions to read-only

Some of these could be translated into exceptions that bubble back to a controlled shutdown
but some of them happen in monitoring threads, and the cost of corruption for multiple masters
or tablet assignment is just too great to let all the threads finish gracefully.

Only the last check really needs an explanation.  Originally, this check was done in the logger,
and it was useful to avoid trying to create new WALogs on failing machines.  But it has proven
useful to find these nodes and notify administrators immediately.

> Make o.a.a.server.util.Halt test-friendly
> -----------------------------------------
>                 Key: ACCUMULO-2561
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Bill Havanki
>            Priority: Minor
>              Labels: testability
> The various servers use {{o.a.a.server.util.Halt}} to terminate the JVM if something
goes wrong. {{Halt}} class calls {{Runtime.halt()}}, which of course would torpedo any testing
going on. The mechanism should be reworked to not kill the JVM in a test scenario.

This message was sent by Atlassian JIRA

View raw message