Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A398B110A6 for ; Fri, 11 Jul 2014 19:21:05 +0000 (UTC) Received: (qmail 82818 invoked by uid 500); 11 Jul 2014 19:21:05 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 82779 invoked by uid 500); 11 Jul 2014 19:21:05 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 82660 invoked by uid 99); 11 Jul 2014 19:21:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jul 2014 19:21:05 +0000 Date: Fri, 11 Jul 2014 19:21:05 +0000 (UTC) From: "Josh Elser (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-2964) Unexpected ThriftSecurityException from BatchScanner MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated ACCUMULO-2964: --------------------------------- Fix Version/s: 1.6.1 > Unexpected ThriftSecurityException from BatchScanner > ---------------------------------------------------- > > Key: ACCUMULO-2964 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2964 > Project: Accumulo > Issue Type: Bug > Components: client, tserver > Reporter: Josh Elser > Priority: Critical > Fix For: 1.6.1, 1.7.0 > > > This is something I've only seen a handful of times when writing/running tests that stop and restart tservers. After the tserver is restarted, there is a thread (typically running in the master) which is trying to read a table. As such, the thread will continue to poll until the tserver comes up. > Very infrequently, the client gets a {{ThriftSecurityException}} with a code of {{DEFAULT_SECURITY_ERROR}} and a message of {{Unknown security exception}}. There is no additional information in the client log (from the thrift call inside the batchscanner), and the tserver contains no error messages at all. > The error that the client saw. > {noformat} > 2014-07-01 04:18:18,971 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host:58090 msg : null > ThriftSecurityException(user:!SYSTEM, code:null) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10045) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result$startMultiScan_resultStandardScheme.read(TabletClientService.java:10022) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_result.read(TabletClientService.java:9961) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:313) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:293) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:632) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:592) > at org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablets(MetadataLocationObtainer.java:181) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:667) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:337) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.processInvalidated(TabletLocatorImpl.java:660) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:610) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:440) > at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:226) > at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84) > at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177) > at org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.createWork(DistributedWorkQueueWorkAssigner.java:161) > at org.apache.accumulo.master.replication.DistributedWorkQueueWorkAssigner.assignWork(DistributedWorkQueueWorkAssigner.java:140) > at org.apache.accumulo.master.replication.WorkDriver.run(WorkDriver.java:97) > {noformat} > The interesting part is that when the client saw this message, the new TabletServer was already started, and the old tabletserver appears to have been dead for 20s. So, the client in the master had been polling for 20s getting a ConnectException (connection refused) which is expected. I don't know why we got this exception after a length of time. > The infrequency in which I see this makes me wonder if the random ports in the new tabletserver are somehow re-grabbing the old tserver's thrift client service port and something is unexpectedly being interpreted as this ThriftSecurityException? That's the only thing that seems remotely possible to me. -- This message was sent by Atlassian JIRA (v6.2#6252)