accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3954) TabletServer advertises existence before acquiring its lock
Date Tue, 11 Aug 2015 17:24:45 GMT


Josh Elser commented on ACCUMULO-3954:

bq. It think the code thats reading this data from zookeeper is doing it wrong. To find live
tservers, need to read the lock data from the children of Constants.ZTSERVERS

That is what the Monitor is ultimately doing. It calls {{connector.instanceOperations().getTabletServers()}}
which uses:

String path = ZooUtil.getRoot(instance) + Constants.ZTSERVERS;

I still think the code is wrong. We should first acquire our lock, and then advertise ourselves.
If we get the lock before advertising our existence, clients cannot (reasonably) know we exist
and get into the situation where the server throws this exception.

> TabletServer advertises existence before acquiring its lock
> -----------------------------------------------------------
>                 Key: ACCUMULO-3954
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Trivial
>             Fix For: 1.8.0
> Noticed this error today on the monitor after restarting Accumulo:
> {noformat}
> 2015-08-06 16:57:10,788 [tserver.TabletServer] WARN : tserver:hostname Got getScans message
from master before lock acquired, ignoring...
> 2015-08-06 16:57:10,791 [tserver.TabletServer$ThriftClientHandler] ERROR: tserver:jelser-phoenix-1.openstacklocal
Lock not acquired
> java.lang.RuntimeException: Lock not acquired
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.checkPermission(
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.getActiveScans(
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>         at java.lang.reflect.Method.invoke(
>         at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(
>         at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(
>         at com.sun.proxy.$Proxy21.getActiveScans(Unknown Source)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$getActiveScans.getResult(
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$getActiveScans.getResult(
>         at org.apache.thrift.ProcessFunction.process(
>         at org.apache.thrift.TBaseProcessor.process(
>         at org.apache.accumulo.server.rpc.TimedProcessor.process(
>         at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(
>         at org.apache.accumulo.server.rpc.CustomNonBlockingServer$
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at
>         at
> {noformat}
> I don't think this should be bubbling up to the monitor as an error. I believe it is
an entirely normal race condition that can happen. If the tabletserver is not ready to accept
an RPC, it can log a message at debug. The error condition would be a tabletserver never acquiring
its lock (and thus should be handled elsewhere).

This message was sent by Atlassian JIRA

View raw message