accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3954) TabletServer advertises existence before acquiring its lock
Date Wed, 12 Aug 2015 15:46:45 GMT


Josh Elser commented on ACCUMULO-3954:

bq. The lock node should be the advertisement. The code in InstanceOperationsImpl.getTabletServers()
seems to be reading the lock node

Ah, I see it now. I missed that {{IOI}} did a second getChildren call.

bq. One possibility is that something like the following is happening :

I wrote that I saw this after a restart, so your assessment is possible. It was an Ambari
installation, so there was likely more downtime in between stopping and then starting the
process than using and then The logic is more like {{for p in $procs;
do stop $p; start $p; done}}. This could exacerbate the situation you outlined.

I'm guessing that this is just a scary looking error that shouldn't be so scary looking after

Thanks for bearing with me, Keith.

> TabletServer advertises existence before acquiring its lock
> -----------------------------------------------------------
>                 Key: ACCUMULO-3954
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Trivial
>             Fix For: 1.8.0
> Noticed this error today on the monitor after restarting Accumulo:
> {noformat}
> 2015-08-06 16:57:10,788 [tserver.TabletServer] WARN : tserver:hostname Got getScans message
from master before lock acquired, ignoring...
> 2015-08-06 16:57:10,791 [tserver.TabletServer$ThriftClientHandler] ERROR: tserver:jelser-phoenix-1.openstacklocal
Lock not acquired
> java.lang.RuntimeException: Lock not acquired
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.checkPermission(
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.getActiveScans(
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>         at java.lang.reflect.Method.invoke(
>         at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(
>         at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(
>         at com.sun.proxy.$Proxy21.getActiveScans(Unknown Source)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$getActiveScans.getResult(
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$getActiveScans.getResult(
>         at org.apache.thrift.ProcessFunction.process(
>         at org.apache.thrift.TBaseProcessor.process(
>         at org.apache.accumulo.server.rpc.TimedProcessor.process(
>         at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(
>         at org.apache.accumulo.server.rpc.CustomNonBlockingServer$
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at
>         at
> {noformat}
> I don't think this should be bubbling up to the monitor as an error. I believe it is
an entirely normal race condition that can happen. If the tabletserver is not ready to accept
an RPC, it can log a message at debug. The error condition would be a tabletserver never acquiring
its lock (and thus should be handled elsewhere).

This message was sent by Atlassian JIRA

View raw message