accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-4197) Intermittent failure of HalfDeadTServerIT
Date Thu, 21 Apr 2016 18:13:25 GMT
Josh Elser created ACCUMULO-4197:

             Summary: Intermittent failure of HalfDeadTServerIT
                 Key: ACCUMULO-4197
             Project: Accumulo
          Issue Type: Bug
          Components: test
            Reporter: Josh Elser
            Priority: Minor
             Fix For: 1.6.6, 1.7.2, 1.8.0

I observed an intermittent failure of HalfDeadTServerIT#testRecover today.

This test "injects" pauses to the C read and write calls inside of tserver. testRecover injects
a pause of 10 seconds to show that the TServer can reconnect to ZK (and not lose its session).

However, with an RPC timeout of 5 seconds, if a minor compaction is triggered, this pause
will cause a compaction to take >10 seconds which will trigger the "hold timeout" causing
an exception to be thrown to the client.

2016-04-20 03:39:26,999 [tserver.TabletServer$ThriftClientHandler] ERROR: Commits are held
org.apache.accumulo.tserver.HoldTimeoutException: Commits are held
	at org.apache.accumulo.tserver.TabletServerResourceManager.waitUntilCommitsAreEnabled(
	at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.flush(
	at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.closeUpdate(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(
	at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(
	at com.sun.proxy.$Proxy21.closeUpdate(Unknown Source)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$closeUpdate.getResult(
	at org.apache.thrift.ProcessFunction.process(
	at org.apache.thrift.TBaseProcessor.process(
	at org.apache.accumulo.server.rpc.TimedProcessor.process(
	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(
	at org.apache.accumulo.server.rpc.CustomNonBlockingServer$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$

I believe this is intermittent based on whether or not a minor compaction is triggered after
the i/o pausing is triggered or not. We could probably work around this by increasing the
threshold for running a minor compaction.

This message was sent by Atlassian JIRA

View raw message