accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4065) Strange temporary errors in Master after upgrade
Date Thu, 26 Nov 2015 19:25:11 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029205#comment-15029205
] 

Josh Elser commented on ACCUMULO-4065:
--------------------------------------

bq. So, simply making a round-trip RPC before a one-way finishes causes an error?

Right. One-way calls still have a return msg on the RPC even though the caller doesn't expect
or wait for it. I believe this is the root of both our synchronization issues (we unreserve
the connection before its truly unused), as well as out of sequence errors in thrift itself.


bq. That seems unexpected. Did you post the test to the thrift project?

I think it's user error on our part. The way the documentation on one-way methods reads to
me is that many one-way calls can share a connection, but not one-way and non one-way calls.
I'd be curious to see how you read it. IMO, some less ambiguous docs here would be very well
served, but it didn't seem like a thrift bug to me. 

bq. Do you have any idea if this can be fixed/patched to work?

Yeah I have an idea. I believe if we just separate cached connections by one-way and not one-way
and make sure the callers request the correct one, this will fix the issue. I'm hoping to
mock up something tmrw or next week. 

> Strange temporary errors in Master after upgrade
> ------------------------------------------------
>
>                 Key: ACCUMULO-4065
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4065
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> I'm running into a problem that I saw quite a while back in ACCUMULO-3653
> I'm still trying to understand what happened, but what I understand so far is that, Accumulo
was running, a newer version was installed beside the running version, Accumulo was stopped,
the symlink changed, and the new version was started. After this, we started seeing a number
of errors in the Master. Shortly after that, the cluster was restarted and the errors stopped
happening.
> This is what I can extract from the logs:
> {noformat}
> 2015-11-19 22:42:47,115 [rpc.TServerUtils] DEBUG: Instantiating default, unsecure custom
half-async Thrift server
> 2015-11-19 22:42:47,122 [master.Master] INFO : Started replication coordinator service
at host3:10001
> 2015-11-19 22:42:47,158 [master.Master] ERROR: Error processing table state for store
Normal Tablets
> java.lang.RuntimeException: java.lang.RuntimeException: Failed to create iterator
> 	at org.apache.accumulo.server.master.state.MetaDataTableScanner.<init>(MetaDataTableScanner.java:72)
> 	at org.apache.accumulo.server.master.state.MetaDataTableScanner.<init>(MetaDataTableScanner.java:56)
> 	at org.apache.accumulo.server.master.state.MetaDataStateStore.iterator(MetaDataStateStore.java:62)
> 	at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:172)
> Caused by: java.lang.RuntimeException: Failed to create iterator
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.<init>(TabletServerBatchReaderIterator.java:158)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReader.iterator(TabletServerBatchReader.java:115)
> 	at org.apache.accumulo.server.master.state.MetaDataTableScanner.<init>(MetaDataTableScanner.java:66)
> 	... 3 more
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server
host3:9997
> 	at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:116)
> 	at org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:95)
> 	at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:463)
> 	at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocationAndCheckLock(TabletLocatorImpl.java:634)
> 	at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:625)
> 	at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:280)
> 	at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:355)
> 	at org.apache.accumulo.core.client.impl.TimeoutTabletLocator.binRanges(TimeoutTabletLocator.java:100)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.binRanges(TabletServerBatchReaderIterator.java:233)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.lookup(TabletServerBatchReaderIterator.java:220)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.<init>(TabletServerBatchReaderIterator.java:154)
> 	... 5 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing flush
> 	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:232)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:208)
> 	at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:98)
> 	... 15 more
> 2015-11-19 22:42:47,178 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (+r<<,host4:9997,35121a475360010)
> 2015-11-19 22:42:47,202 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: NotServingTabletException(extent:TKeyExtent(table:2B 72, endRow:null, prevEndRow:null))
> 2015-11-19 22:42:47,283 [impl.ThriftScanner] DEBUG: Scan failed, not serving tablet (+r<<,host4:9997,35121a475360010)
> 2015-11-19 22:42:47,372 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997
msg : startMultiScan failed: unknown result
> org.apache.thrift.TApplicationException: startMultiScan failed: unknown result
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:42:47,373 [impl.TabletServerBatchReaderIterator] WARN : Error on server
host4:9997
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.TApplicationException: startMultiScan failed: unknown result
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	... 6 more
> 2015-11-19 22:42:47,376 [master.Master] ERROR: Error processing table state for store
Metadata Tablets
> java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:181)
> 	at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:121)
> 	at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:173)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server
host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.TApplicationException: startMultiScan failed: unknown result
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	... 6 more
> {noformat}
> A bit later:
> {noformat}
> 2015-11-19 22:43:04,572 [recovery.RecoveryManager] DEBUG: Recovering hdfs://mycluster/apps/accumulo/data/wal/host4+9997/a2831ffa-c980-47bf-9f33-14716a0df6ec
to hdfs://mycluster/apps/accumulo/data/recovery/a2831ffa-c980-47bf-9f33-14716a0df6ec
> 2015-11-19 22:43:04,575 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997
msg : closeMultiScan failed: out of sequence response
> org.apache.thrift.TApplicationException: closeMultiScan failed: out of sequence response
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:04,575 [impl.TabletServerBatchReaderIterator] WARN : Error on server
host4:9997
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.TApplicationException: closeMultiScan failed: out of sequence
response
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681)
> 	... 6 more
> 2015-11-19 22:43:04,576 [master.Master] ERROR: Error processing table state for store
Metadata Tablets
> java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:181)
> 	at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:121)
> 	at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:173)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server
host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.TApplicationException: closeMultiScan failed: out of sequence
response
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681)
> 	... 6 more
> 2015-11-19 22:43:04,882 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got c
> 2015-11-19 22:43:04,985 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 0
> 2015-11-19 22:43:05,089 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 16
> 2015-11-19 22:43:05,192 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffd6
> 2015-11-19 22:43:05,296 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got fffffff1
> 2015-11-19 22:43:05,399 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffb7
> 2015-11-19 22:43:05,502 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffe4
> 2015-11-19 22:43:05,605 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff98
> 2015-11-19 22:43:05,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997
msg : Expected protocol id ffffff82 but got fffffff7
> org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got
fffffff7
> 	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:05,688 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException:
Expected protocol id ffffff82 but got fffffff7
> java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol
id ffffff82 but got fffffff7
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82
but got fffffff7
> 	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	... 6 more
> 2015-11-19 22:43:05,708 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffcf
> 2015-11-19 22:43:05,793 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997
msg : Expected protocol id ffffff82 but got ffffffc6
> org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got
ffffffc6
> 	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:05,794 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException:
Expected protocol id ffffff82 but got ffffffc6
> java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol
id ffffff82 but got ffffffc6
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82
but got ffffffc6
> 	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	... 6 more
> 2015-11-19 22:43:05,810 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffffd4
> 2015-11-19 22:43:05,913 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 1
> 2015-11-19 22:43:05,960 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got 1c
> 2015-11-19 22:43:05,997 [impl.TabletServerBatchReaderIterator] DEBUG: Server : host4:9997
msg : Expected protocol id ffffff82 but got 19
> org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got
19
> 	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:05,998 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.protocol.TProtocolException:
Expected protocol id ffffff82 but got 19
> java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected protocol
id ffffff82 but got 19
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82
but got 19
> 	at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634)
> 	... 6 more
> 2015-11-19 22:43:06,006 [master.Master] WARN : Lost servers [host5:9997[25121a475480008]]
> {noformat}
> And even later
> {noformat}
> 2015-11-19 22:43:41,810 [tracer.ZooTraceClient] DEBUG: Processing event for trace server
zk watch
> 2015-11-19 22:43:41,812 [tracer.ZooTraceClient] DEBUG: Scanning trace hosts in zookeeper:
/tracers
> 2015-11-19 22:43:41,813 [tracer.ZooTraceClient] DEBUG: Trace hosts: [10.240.0.76:12234,
10.240.0.76:12234]
> 2015-11-19 22:43:42,066 [impl.TabletServerBatchReaderIterator] WARN : null column family
> java.lang.IllegalArgumentException: null column family
> 	at org.apache.accumulo.core.data.Key.<init>(Key.java:391)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:42,070 [master.Master] ERROR: Error processing table state for store
Metadata Tablets
> java.lang.IllegalArgumentException: null column family
> 	at org.apache.accumulo.core.data.Key.<init>(Key.java:391)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:43,178 [impl.TabletServerBatchReaderIterator] WARN : null column family
> java.lang.IllegalArgumentException: null column family
> 	at org.apache.accumulo.core.data.Key.<init>(Key.java:391)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:43,178 [master.Master] ERROR: Error processing table state for store
Metadata Tablets
> java.lang.IllegalArgumentException: null column family
> 	at org.apache.accumulo.core.data.Key.<init>(Key.java:391)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:43:44,284 [impl.TabletServerBatchReaderIterator] WARN : null column family
> java.lang.IllegalArgumentException: null column family
> 	at org.apache.accumulo.core.data.Key.<init>(Key.java:391)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349)
> 	at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> And even more
> {noformat}
> 2015-11-19 22:44:05,375 [recovery.RecoveryManager] DEBUG: Recovering hdfs://mycluster/apps/accumulo/data/wal/host4+9997/a2831ffa-c980-47bf-9f33-14716a0df6ec
to hdfs://mycluster/apps/accumulo/data/recovery/a2831ffa-c980-47bf-9f33-14716a0df6ec
> 2015-11-19 22:44:05,385 [master.Master] DEBUG: 2 assigned to dead servers: [!0;~<@(null,host4:9997[35121a475360010],host4:9997[35121a475360010]),
!0<;~@(null,host5:9997[25121a475480008],host5:9997[25121a475480008])]...
> 2015-11-19 22:44:05,405 [impl.TabletServerBatchWriter] ERROR: Server side error on host4:9997:
org.apache.thrift.TApplicationException: startUpdate failed: unknown result
> 2015-11-19 22:44:05,405 [master.Master] ERROR: Error processing table state for store
Metadata Tablets
> org.apache.accumulo.server.master.state.DistributedStoreException: org.apache.accumulo.core.client.MutationsRejectedException:
# constraint violations : 0  security codes: {}  # server errors 1 # exceptions 0
> 	at org.apache.accumulo.server.master.state.MetaDataStateStore.unassign(MetaDataStateStore.java:139)
> 	at org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:738)
> 	at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295)
> Caused by: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations
: 0  security codes: {}  # server errors 1 # exceptions 0
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:550)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:361)
> 	at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54)
> 	at org.apache.accumulo.server.master.state.MetaDataStateStore.unassign(MetaDataStateStore.java:137)
> 	... 2 more
> 2015-11-19 22:44:05,406 [impl.TabletServerBatchWriter] ERROR: Failed to send tablet server
host4:9997 its batch : Error on server host4:9997
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server host4:9997
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:950)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.access$1900(TabletServerBatchWriter.java:629)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:816)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:780)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.TApplicationException: startUpdate failed: unknown result
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startUpdate(TabletClientService.java:403)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startUpdate(TabletClientService.java:381)
> 	at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:893)
> 	... 9 more
> {noformat}
> And, curiously, after this exception, things seem to get happy:
> {noformat}
> 2015-11-19 22:46:35,247 [transport.TIOStreamTransport] WARN : Error closing output stream.
> java.io.IOException: The stream is closed
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:118)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
>         at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
>         at org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89)
>         at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.close(ThriftTransportPool.java:309)
>         at org.apache.accumulo.core.client.impl.ThriftTransportPool.returnTransport(ThriftTransportPool.java:571)
>         at org.apache.accumulo.core.rpc.ThriftUtil.returnClient(ThriftUtil.java:147)
>         at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:113)
>         at org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:95)
>         at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:463)
>         at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocationAndCheckLock(TabletLocatorImpl.java:634)
>         at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:620)
>         at org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:439)
>         at org.apache.accumulo.core.client.impl.Writer.update(Writer.java:88)
>         at org.apache.accumulo.server.util.MetadataTableUtil.update(MetadataTableUtil.java:153)
>         at org.apache.accumulo.server.util.MetadataTableUtil.update(MetadataTableUtil.java:145)
>         at org.apache.accumulo.server.util.MetadataTableUtil.addTablet(MetadataTableUtil.java:211)
>         at org.apache.accumulo.master.tableOps.PopulateMetadata.call(PopulateMetadata.java:43)
>         at org.apache.accumulo.master.tableOps.PopulateMetadata.call(PopulateMetadata.java:25)
>         at org.apache.accumulo.master.tableOps.TraceRepo.call(TraceRepo.java:57)
>         at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-11-19 22:46:35,249 [impl.ThriftScanner] DEBUG: Error getting transport to host4:9997
: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000
millis timeout while wai
> ting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/10.240.0.76:40610 remote=host4/10.240.0.77:9997]
> 2015-11-19 22:46:35,258 [replication.ReplicationDriver] ERROR: Caught Exception trying
to create Replication status records
> java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server host5:9997
>         at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:161)
>         at org.apache.accumulo.master.replication.StatusMaker.run(StatusMaker.java:94)
>         at org.apache.accumulo.master.replication.ReplicationDriver.run(ReplicationDriver.java:87)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server
host5:9997
>         at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:293)
>         at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80)
>         at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:151)
>         ... 2 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing flush
>         at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:232)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:208)
>         at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:410)
>         at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285)
>         ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message