Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BF31018EBA for ; Tue, 1 Dec 2015 18:28:17 +0000 (UTC) Received: (qmail 8589 invoked by uid 500); 1 Dec 2015 18:28:11 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 8512 invoked by uid 500); 1 Dec 2015 18:28:11 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 8157 invoked by uid 99); 1 Dec 2015 18:28:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2015 18:28:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 5EA4C2C1F6B for ; Tue, 1 Dec 2015 18:28:11 +0000 (UTC) Date: Tue, 1 Dec 2015 18:28:11 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-4065) Strange temporary errors in Master after upgrade MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034285#comment-15034285 ] ASF GitHub Bot commented on ACCUMULO-4065: ------------------------------------------ Github user keith-turner commented on a diff in the pull request: https://github.com/apache/accumulo/pull/56#discussion_r46317203 --- Diff: server/base/src/main/java/org/apache/accumulo/server/util/RpcWrapper.java --- @@ -36,20 +42,51 @@ * @since 1.6.1 */ public class RpcWrapper { + private static final Logger log = LoggerFactory.getLogger(RpcWrapper.class); + + public static T service(final T instance, @SuppressWarnings("rawtypes") final Map> processorView) { + // Get a handle on the isOnewayMethod and make it accessible + final Method isOnewayMethod; + try { + isOnewayMethod = ProcessFunction.class.getDeclaredMethod("isOneway"); + } catch (NoSuchMethodException e) { + throw new RuntimeException("Could not access isOneway method", e); + } catch (SecurityException e) { + throw new RuntimeException("Could not access isOneway method", e); + } + isOnewayMethod.setAccessible(true); + + final Set onewayMethods = new HashSet(); + for (@SuppressWarnings("rawtypes") Entry> entry : processorView.entrySet()) { + try { + if ((Boolean) isOnewayMethod.invoke(entry.getValue())) { + onewayMethods.add(entry.getKey()); + } + } catch (Exception e) { + throw new RuntimeException(e); + } + } + log.debug("Found oneway Thrift methods: " + onewayMethods); - public static T service(final T instance) { InvocationHandler handler = new RpcServerInvocationHandler(instance) { private final Logger log = LoggerFactory.getLogger(instance.getClass()); @Override public Object invoke(Object obj, Method method, Object[] args) throws Throwable { + // e.g. ThriftClientHandler.flush(TInfo, TCredentials, ...) try { return super.invoke(obj, method, args); } catch (RuntimeException e) { + if (onewayMethods.contains(method.getName())) { --- End diff -- ok, in that case could add a sanity check to the code above that adds method names to the set. When add is called on the set it should always return true. If add returns false, then throw a new WTF exception > Strange temporary errors in Master after upgrade > ------------------------------------------------ > > Key: ACCUMULO-4065 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4065 > Project: Accumulo > Issue Type: Bug > Components: master > Affects Versions: 1.6.4, 1.7.0 > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 1.6.5, 1.7.1, 1.8.0 > > > I'm running into a problem that I saw quite a while back in ACCUMULO-3653 > I'm still trying to understand what happened, but what I understand so far is that, Accumulo was running, a newer version was installed beside the running version, Accumulo was stopped, the symlink changed, and the new version was started. After this, we started seeing a number of errors in the Master. Shortly after that, the cluster was restarted and the errors stopped happening. > This is what I can extract from the logs: > {noformat} > 2015-11-19 22:42:47,115 [rpc.TServerUtils] DEBUG: Instantiating default, unsecure custom half-async Thrift server > 2015-11-19 22:42:47,122 [master.Master] INFO : Started replication coordinator service at host3:10001 > 2015-11-19 22:42:47,158 [master.Master] ERROR: Error processing table state for store Normal Tablets > java.lang.RuntimeException: java.lang.RuntimeException: Failed to create iterator > at org.apache.accumulo.server.master.state.MetaDataTableScanner.(MetaDataTableScanner.java:72) > at org.apache.accumulo.server.master.state.MetaDataTableScanner.(MetaDataTableScanner.java:56) > at org.apache.accumulo.server.master.state.MetaDataStateStore.iterator(MetaDataStateStore.java:62) > at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:172) > Caused by: java.lang.RuntimeException: Failed to create iterator > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.(TabletServerBatchReaderIterator.java:158) > at org.apache.accumulo.core.client.impl.TabletServerBatchReader.iterator(TabletServerBatchReader.java:115) > at org.apache.accumulo.server.master.state.MetaDataTableScanner.(MetaDataTableScanner.java:66) > ... 3 more > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host3:9997 > at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:116) > at org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:95) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:463) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocationAndCheckLock(TabletLocatorImpl.java:634) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:625) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:280) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:355) > at org.apache.accumulo.core.client.impl.TimeoutTabletLocator.binRanges(TimeoutTabletLocator.java:100) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.binRanges(TabletServerBatchReaderIterator.java:233) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.lookup(TabletServerBatchReaderIterator.java:220) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.(TabletServerBatchReaderIterator.java:154) > ... 5 more > Caused by: org.apache.thrift.TApplicationException: Internal error processing flush > at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:232) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:208) > at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:98) > ... 15 more > 2015-11-19 22:42:47,178 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (+r<<,host4:9997,35121a475360010) > 2015-11-19 22:42:47,202 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : NotServingTabletException(extent:TKeyExtent(table:2B 72, endRow:null, prevEndRow:null)) > 2015-11-19 22:42:47,283 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (+r<<,host4:9997,35121a475360010) > 2015-11-19 22:42:47,372 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997 msg : startMultiScan failed: unknown result > org.apache.thrift.TApplicationException: startMultiScan failed: unknown result > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:42:47,373 [impl.TabletServerBatchReaderIterator] WARN : Error on server host4:9997 > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: startMultiScan failed: unknown result > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:42:47,376 [master.Master] ERROR: Error processing table state for store Metadata Tablets > java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:181) > at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:121) > at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:173) > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: startMultiScan failed: unknown result > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > {noformat} > A bit later: > {noformat} > 2015-11-19 22:43:04,572 [recovery.RecoveryManager] DEBUG: Recovering hdfs://mycluster/apps/accumulo/data/wal/host4+9997/a2831ffa-c980-47bf-9f33-14716a0df6ec to hdfs://mycluster/apps/accumulo/data/recovery/a2831ffa-c980-47bf-9f33-14716a0df6ec > 2015-11-19 22:43:04,575 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997 msg : closeMultiScan failed: out of sequence response > org.apache.thrift.TApplicationException: closeMultiScan failed: out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:04,575 [impl.TabletServerBatchReaderIterator] WARN : Error on server host4:9997 > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: closeMultiScan failed: out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681) > ... 6 more > 2015-11-19 22:43:04,576 [master.Master] ERROR: Error processing table state for store Metadata Tablets > java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:181) > at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:121) > at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:173) > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: closeMultiScan failed: out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681) > ... 6 more > 2015-11-19 22:43:04,882 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got c > 2015-11-19 22:43:04,985 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 0 > 2015-11-19 22:43:05,089 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 16 > 2015-11-19 22:43:05,192 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffd6 > 2015-11-19 22:43:05,296 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got fffffff1 > 2015-11-19 22:43:05,399 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffb7 > 2015-11-19 22:43:05,502 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffe4 > 2015-11-19 22:43:05,605 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff98 > 2015-11-19 22:43:05,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997 msg : Expected protocol id ffffff82 but got fffffff7 > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got fffffff7 > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:05,688 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got fffffff7 > java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got fffffff7 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got fffffff7 > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:43:05,708 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffcf > 2015-11-19 22:43:05,793 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997 msg : Expected protocol id ffffff82 but got ffffffc6 > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffc6 > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:05,794 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffc6 > java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffc6 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffc6 > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:43:05,810 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffd4 > 2015-11-19 22:43:05,913 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 1 > 2015-11-19 22:43:05,960 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 1c > 2015-11-19 22:43:05,997 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997 msg : Expected protocol id ffffff82 but got 19 > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 19 > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:05,998 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 19 > java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 19 > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 19 > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:43:06,006 [master.Master] WARN : Lost servers [host5:9997[25121a475480008]] > {noformat} > And even later > {noformat} > 2015-11-19 22:43:41,810 [tracer.ZooTraceClient] DEBUG: Processing event for trace server zk watch > 2015-11-19 22:43:41,812 [tracer.ZooTraceClient] DEBUG: Scanning trace hosts in zookeeper: /tracers > 2015-11-19 22:43:41,813 [tracer.ZooTraceClient] DEBUG: Trace hosts: [10.240.0.76:12234, 10.240.0.76:12234] > 2015-11-19 22:43:42,066 [impl.TabletServerBatchReaderIterator] WARN : null column family > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.(Key.java:391) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:42,070 [master.Master] ERROR: Error processing table state for store Metadata Tablets > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.(Key.java:391) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:43,178 [impl.TabletServerBatchReaderIterator] WARN : null column family > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.(Key.java:391) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:43,178 [master.Master] ERROR: Error processing table state for store Metadata Tablets > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.(Key.java:391) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:44,284 [impl.TabletServerBatchReaderIterator] WARN : null column family > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.(Key.java:391) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > {noformat} > And even more > {noformat} > 2015-11-19 22:44:05,375 [recovery.RecoveryManager] DEBUG: Recovering hdfs://mycluster/apps/accumulo/data/wal/host4+9997/a2831ffa-c980-47bf-9f33-14716a0df6ec to hdfs://mycluster/apps/accumulo/data/recovery/a2831ffa-c980-47bf-9f33-14716a0df6ec > 2015-11-19 22:44:05,385 [master.Master] DEBUG: 2 assigned to dead servers: [!0;~<@(null,host4:9997[35121a475360010],host4:9997[35121a475360010]), !0<;~@(null,host5:9997[25121a475480008],host5:9997[25121a475480008])]... > 2015-11-19 22:44:05,405 [impl.TabletServerBatchWriter] ERROR: Server side error on host4:9997: org.apache.thrift.TApplicationException: startUpdate failed: unknown result > 2015-11-19 22:44:05,405 [master.Master] ERROR: Error processing table state for store Metadata Tablets > org.apache.accumulo.server.master.state.DistributedStoreException: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0 security codes: {} # server errors 1 # exceptions 0 > at org.apache.accumulo.server.master.state.MetaDataStateStore.unassign(MetaDataStateStore.java:139) > at org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:738) > at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295) > Caused by: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0 security codes: {} # server errors 1 # exceptions 0 > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:550) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:361) > at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54) > at org.apache.accumulo.server.master.state.MetaDataStateStore.unassign(MetaDataStateStore.java:137) > ... 2 more > 2015-11-19 22:44:05,406 [impl.TabletServerBatchWriter] ERROR: Failed to send tablet server host4:9997 its batch : Error on server host4:9997 > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997 > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:950) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.access$1900(TabletServerBatchWriter.java:629) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:816) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:780) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: startUpdate failed: unknown result > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startUpdate(TabletClientService.java:403) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startUpdate(TabletClientService.java:381) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:893) > ... 9 more > {noformat} > And, curiously, after this exception, things seem to get happy: > {noformat} > 2015-11-19 22:46:35,247 [transport.TIOStreamTransport] WARN : Error closing output stream. > java.io.IOException: The stream is closed > at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:118) > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.FilterOutputStream.close(FilterOutputStream.java:158) > at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) > at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89) > at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.close(ThriftTransportPool.java:309) > at org.apache.accumulo.core.client.impl.ThriftTransportPool.returnTransport(ThriftTransportPool.java:571) > at org.apache.accumulo.core.rpc.ThriftUtil.returnClient(ThriftUtil.java:147) > at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:113) > at org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:95) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:463) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocationAndCheckLock(TabletLocatorImpl.java:634) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:620) > at org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:439) > at org.apache.accumulo.core.client.impl.Writer.update(Writer.java:88) > at org.apache.accumulo.server.util.MetadataTableUtil.update(MetadataTableUtil.java:153) > at org.apache.accumulo.server.util.MetadataTableUtil.update(MetadataTableUtil.java:145) > at org.apache.accumulo.server.util.MetadataTableUtil.addTablet(MetadataTableUtil.java:211) > at org.apache.accumulo.master.tableOps.PopulateMetadata.call(PopulateMetadata.java:43) > at org.apache.accumulo.master.tableOps.PopulateMetadata.call(PopulateMetadata.java:25) > at org.apache.accumulo.master.tableOps.TraceRepo.call(TraceRepo.java:57) > at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:46:35,249 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997 : org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while wai > ting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.240.0.76:40610 remote=host4/10.240.0.77:9997] > 2015-11-19 22:46:35,258 [replication.ReplicationDriver] ERROR: Caught Exception trying to create Replication status records > java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host5:9997 > at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:161) > at org.apache.accumulo.master.replication.StatusMaker.run(StatusMaker.java:94) > at org.apache.accumulo.master.replication.ReplicationDriver.run(ReplicationDriver.java:87) > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host5:9997 > at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:293) > at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80) > at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:151) > ... 2 more > Caused by: org.apache.thrift.TApplicationException: Internal error processing flush > at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:232) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:208) > at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:410) > at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285) > ... 4 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)