accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3954) Demote warning/error about RPCs received in tabletserver before lock is acquired
Date Thu, 06 Aug 2015 17:15:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660377#comment-14660377
] 

Josh Elser commented on ACCUMULO-3954:
--------------------------------------

Actually, I'm wondering if we can actually fix the race condition in the TabletServer itself
in the {{announceExistence}} method. It appears that we:

1. Advertise our existence in zookeeper
2. Acquire TabletServer zookeeper lock

It seems like this is backwards and that we should acquire the lock first, and then tell the
world that we exist. The monitor saw the tabletserver existed (via step 1) and happened to
call getActiveScans before step 2 happened. If we reverse the order in the tabletserver, it
should be impossible for the monitor to cause this.

Also, it's worth mentioning that the log message is wrong too (the rpc came from the monitor,
not the master). We should just make the warning message more generic, e.g. "got $method message
before lock was acquired".

> Demote warning/error about RPCs received in tabletserver before lock is acquired
> --------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-3954
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3954
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Trivial
>             Fix For: 1.8.0
>
>
> Noticed this error today on the monitor after restarting Accumulo:
> {noformat}
> 2015-08-06 16:57:10,788 [tserver.TabletServer] WARN : tserver:hostname Got getScans message
from master before lock acquired, ignoring...
> 2015-08-06 16:57:10,791 [tserver.TabletServer$ThriftClientHandler] ERROR: tserver:jelser-phoenix-1.openstacklocal
Lock not acquired
> java.lang.RuntimeException: Lock not acquired
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.checkPermission(TabletServer.java:1400)
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.getActiveScans(TabletServer.java:1622)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:39)
>         at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:47)
>         at com.sun.proxy.$Proxy21.getActiveScans(Unknown Source)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$getActiveScans.getResult(TabletClientService.java:2864)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$getActiveScans.getResult(TabletClientService.java:2848)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
>         at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>         at org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I don't think this should be bubbling up to the monitor as an error. I believe it is
an entirely normal race condition that can happen. If the tabletserver is not ready to accept
an RPC, it can log a message at debug. The error condition would be a tabletserver never acquiring
its lock (and thus should be handled elsewhere).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message