Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC908D4D2 for ; Thu, 19 Jul 2012 18:56:41 +0000 (UTC) Received: (qmail 50668 invoked by uid 500); 19 Jul 2012 18:56:41 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 50646 invoked by uid 500); 19 Jul 2012 18:56:41 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 50638 invoked by uid 99); 19 Jul 2012 18:56:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 18:56:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.101.109.61] (HELO Emailgw01.pnnl.gov) (192.101.109.61) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Jul 2012 18:56:35 +0000 Received: from emailhub02.pnl.gov ([130.20.251.62]) by Emailgw01.pnnl.gov with ESMTP/TLS/AES128-SHA; 19 Jul 2012 11:56:15 -0700 Received: from email06.pnl.gov ([130.20.251.71]) by emailhub02.pnl.gov ([130.20.251.62]) with mapi; Thu, 19 Jul 2012 11:55:56 -0700 From: "Perko, Ralph J" To: "user@accumulo.apache.org" Date: Thu, 19 Jul 2012 11:55:03 -0700 Subject: Re: table data missing Thread-Topic: table data missing Thread-Index: Ac1l4CukVj9EQur1RWijXYhbhS23CQ== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the help. It is fixed and was related to the loggers as you said. Here is what I did: Environment: 7-node managed cluster Problem: The walogs directory was configured to use a shared directory which was used by all the nodes. When the loggers were trying to start only the first one there could get the .lock file, the others failed to start and my tables were not visible (though I am not sure why. The one running logger had access to the walog files) Solution: I created a new walogs directory on partitions unique to each node. I then copied the contents of the original walogs directory to the new walogs directory on each node. I restarted accumulo and all the tables were back. Thanks again, Ralph On 7/19/12 9:43 AM, "Eric Newton" wrote: >You should have as many loggers as you have tablet servers. > >Your log recovery is failing because the loggers are not running. > >Please start all your loggers, and/or determine while they are going >down. Then restart the master and the system should recover. > >-Eric > >On Thu, Jul 19, 2012 at 12:39 PM, Perko, Ralph J >wrote: >> From the master log file at startup: >> >> 9 08:38:40,612 [master.CoordinateRecoveryTask] WARN : Unable to recover >>=20 >>192.168.1.244:11224/65911601-d684-43e8-94b3-cdf959590298(java.io.IOExcept >>io >> n: org.apache.thrift.transport.TTransportException: >> java.net.ConnectException: Connection refused) >> java.io.IOException: org.apache.thrift.transport.TTransportException: >> java.net.ConnectException: Connection refused >> at >>=20 >>org.apache.accumulo.server.tabletserver.log.RemoteLogger.(RemoteLog >>ge >> r.java:99) >> at >>=20 >>org.apache.accumulo.server.master.CoordinateRecoveryTask$RecoveryJob.star >>tC >> opy(CoordinateRecoveryTask.java:132) >> at >>=20 >>org.apache.accumulo.server.master.CoordinateRecoveryTask$RecoveryJob.acce >>ss >> $400(CoordinateRecoveryTask.java:114) >> at >>=20 >>org.apache.accumulo.server.master.CoordinateRecoveryTask.recover(Coordina >>te >> RecoveryTask.java:289) >> at >>=20 >>org.apache.accumulo.server.master.Master$TabletGroupWatcher.run(Master.ja >>va >> :1351) >> Caused by: org.apache.thrift.transport.TTransportException: >> java.net.ConnectException: Connection refused >> at >>=20 >>org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTranspo >>rt >> (ThriftTransportPool.java:475) >> at >>=20 >>org.apache.accumulo.core.client.impl.ThriftTransportPool.getTransport(Thr >>if >> tTransportPool.java:464) >> at >>=20 >>org.apache.accumulo.core.client.impl.ThriftTransportPool.getTransport(Thr >>if >> tTransportPool.java:441) >> at=20 >>org.apache.accumulo.core.util.ThriftUtil.getClient(ThriftUtil.java:67) >> at >>=20 >>org.apache.accumulo.server.tabletserver.log.RemoteLogger.(RemoteLog >>ge >> r.java:96) >> ... 4 more >> Caused by: java.net.ConnectException: Connection refused >> at sun.nio.ch.Net.connect(Native Method) >> at=20 >>sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500) >> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:81) >> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:65) >> at >>=20 >>org.apache.accumulo.core.util.TTimeoutTransport.create(TTimeoutTransport. >>ja >> va:39) >> at >>=20 >>org.apache.accumulo.core.client.impl.ThriftTransportPool.createNewTranspo >>rt >> (ThriftTransportPool.java:473) >> ... 8 more >> 19 08:38:40,652 [master.CoordinateRecoveryTask] WARN : Recovery of >> 192.168.1.244:11224:65911601-d684-43e8-94b3-cdf959590298 failed >> 19 08:38:45,071 [master.CoordinateRecoveryTask] INFO : Deleting recovery >> directory org.apache.hadoop.fs.FileStatus@75641fd >> 19 09:08:40,848 [master.CoordinateRecoveryTask] WARN : Recovery taking >>too >> long, giving up >> 19 09:08:40,848 [master.EventCoordinator] INFO : Log recovery >> 192.168.1.244:11224/65911601-d684-43e8-94b3-cdf959590298 complete >> >> >> >> >> On 7/19/12 9:34 AM, "Keith Turner" wrote: >> >>>What you are describing sounds like ZooKeeper is up and running (this >>>is where table config info is stored, so thats why you can list >>>tables), but not tablets are assigned to tablet servers. Need to >>>determine why no tablets are assigned. Look in the master log for >>>anything suspicious related to tablet assignment. >>> >>> >>>On Thu, Jul 19, 2012 at 12:28 PM, Perko, Ralph J >>>wrote: >>>> Hi, >>>> >>>> I restarted my cluster and now the Accumulo Overview page says there >>>>are 0 tables. However, when I go to the Table List page, all my tables >>>>are listed with a status of "ONLINE" but nothing else. From the >>>>Accumulo shell I cannot access any of my tables but I can list them, >>>>like the web site. Hadoop is up and healthy. The tablet servers are >>>>up >>>>but each states 0 for Hosted Tablets. Do you know what is causing this >>>>and how to fix it? >>>> >>>> Thanks, >>>> Ralph >>>> >>>> >>