Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19BDD10802 for ; Wed, 26 Feb 2014 15:41:43 +0000 (UTC) Received: (qmail 34955 invoked by uid 500); 26 Feb 2014 15:41:31 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 34882 invoked by uid 500); 26 Feb 2014 15:41:30 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 34511 invoked by uid 99); 26 Feb 2014 15:41:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 15:41:22 +0000 Date: Wed, 26 Feb 2014 15:41:22 +0000 (UTC) From: "Eric Newton (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2408) metadata table not assigned after root table is loaded MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913031#comment-13913031 ] Eric Newton commented on ACCUMULO-2408: --------------------------------------- Seems to be the same issue as ACCUMULO-1861. > metadata table not assigned after root table is loaded > ------------------------------------------------------ > > Key: ACCUMULO-2408 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2408 > Project: Accumulo > Issue Type: Bug > Components: master > Reporter: Eric Newton > Assignee: Eric Newton > Priority: Critical > Fix For: 1.6.0 > > > During a nightly integration test run, BigRootTableIT failed, timing out after 4 minutes: > {noformat} > java.lang.Exception: test timed out after 240000 milliseconds > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282) > at org.apache.accumulo.core.client.admin.TableOperationsImpl.addSplits(TableOperationsImpl.java:437) > at org.apache.accumulo.test.functional.BigRootTabletIT.test(BigRootTabletIT.java:50) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} > Looking at the logs, the root tablet is assigned successfully: > {noformat} > 2014-02-26 05:17:09,414 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(tserver1:9997[1446db2884a0002],null,null) > 2014-02-26 05:17:09,596 [master.EventCoordinator] INFO : tablet +r<< was loaded on tserver1:9997 > {noformat} > No other tablets are assigned for the next four minutes. > The logs are full of "Failed to bin" errors: > {noformat} > 2014-02-26 05:19:09,613 [impl.ThriftTransportPool] TRACE: Using existing connection to tserver1:9997 > 2014-02-26 05:19:09,615 [impl.ThriftTransportPool] TRACE: Returned connection tserver1:9997 (120000) ioCount : 562 > 2014-02-26 05:19:09,615 [metadata.MetadataLocationObtainer] TRACE: tid=28 oid=3448 Got 2 results from +r<< in 0.002 secs > 2014-02-26 05:19:09,615 [impl.TabletLocatorImpl] TRACE: tid=28 oid=3446 Binned 1 ranges for table !0 to 0 tservers in 0.003 secs > 2014-02-26 05:19:09,616 [impl.TabletServerBatchReaderIterator] TRACE: Failed to bin 1 ranges, tablet locations were null, retrying in 100ms > {noformat} > There is an IOException, trying to do a batch read > {noformat} > 2014-02-26 05:19:09,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : tserver1:9997 msg : java.net.SocketTimeoutException: 120000 millis timeout while > waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] > 2014-02-26 05:19:09,689 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting > for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] > java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio. > channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:713) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:372) > at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) > at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] > at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270) > at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311) > at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291) > at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:658) > ... 7 more > Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997] > at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 18 more > 2014-02-26 05:19:09,693 [impl.TabletServerBatchReaderIterator] TRACE: Failed to execute multiscans against 1 tablets, retrying... > {noformat} > This would appear to be the batch scanner used to read the root table in the master. > The tablet server hosting the root tablet is being successfully scanned more that 24x a second, presumably from clients. > There are no errors in the tserver logs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)