hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chia-Ping Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-19624) TestIOFencing hangs
Date Mon, 25 Dec 2017 19:53:02 GMT

     [ https://issues.apache.org/jira/browse/HBASE-19624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chia-Ping Tsai updated HBASE-19624:
-----------------------------------
    Status: Patch Available  (was: Open)

> TestIOFencing hangs
> -------------------
>
>                 Key: HBASE-19624
>                 URL: https://issues.apache.org/jira/browse/HBASE-19624
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>             Fix For: 2.0.0
>
>         Attachments: HBASE-19624.v0.patch
>
>
> RS calls CompactSplit#join to cease all compactSplit threads.
> {code:title=CompactSplit.java}
>   private void waitFor(ThreadPoolExecutor t, String name) {
>     boolean done = false;
>     while (!done) {
>       try {
>         done = t.awaitTermination(60, TimeUnit.SECONDS);
>         LOG.info("Waiting for " + name + " to finish...");
>         if (!done) {
>           t.shutdownNow();
>         }
>       } catch (InterruptedException ie) {
>         LOG.warn("Interrupted waiting for " + name + " to finish...");
>       }
>     }
>   }
> {code}
> In the meantime, the async wal may wait for the sync signal. However, the single won't
happen as the wal sync is failed.
> {code}
>   synchronized long get(long timeoutNs) throws InterruptedException,
>       ExecutionException, TimeoutIOException {
>     final long done = System.nanoTime() + timeoutNs;
>     while (!isDone()) {
>       wait(1000);
>       if (System.nanoTime() >= done) {
>         throw new TimeoutIOException(
>             "Failed to get sync result after " + TimeUnit.NANOSECONDS.toMillis(timeoutNs)
>                 + " ms for txid=" + this.txid + ", WAL system stuck?");
>       }
>     }
>     if (this.throwable != null) {
>       throw new ExecutionException(this.throwable);
>     }
>     return this.doneTxid;
>   }
> {code}
> When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the interrupt single
to all rs threads. And then catching the InterruptedException cause compactionsplit to skip
the #shutdownNow, hence the compactionsplit threads were up until timeout (default is 5 min).
  
> {code}
>       for (int i = 0; i < 100; ++i) {
>         boolean atLeastOneLiveServer = false;
>         for (RegionServerThread t : regionservers) {
>           if (t.isAlive()) {
>             atLeastOneLiveServer = true;
>             try {
>               LOG.warn("RegionServerThreads remaining, give one more chance before interrupting");
>               t.join(1000);
>             } catch (InterruptedException e) {
>               wasInterrupted = true;
>             }
>           }
>         }
>         if (!atLeastOneLiveServer) break;
>         for (RegionServerThread t : regionservers) {
>           if (t.isAlive()) {
>             LOG.warn("RegionServerThreads taking too long to stop, interrupting");
>             t.interrupt();
>           }
>         }
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message