Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B130F17BB4 for ; Fri, 15 May 2015 15:45:01 +0000 (UTC) Received: (qmail 2153 invoked by uid 500); 15 May 2015 15:45:01 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 2114 invoked by uid 500); 15 May 2015 15:45:01 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 2100 invoked by uid 99); 15 May 2015 15:45:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 May 2015 15:45:01 +0000 Date: Fri, 15 May 2015 15:45:01 +0000 (UTC) From: "Eric Newton (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-3774) Deadlock after recovering root tablet MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Newton updated ACCUMULO-3774: ---------------------------------- Issue Type: Sub-task (was: Bug) Parent: ACCUMULO-3423 > Deadlock after recovering root tablet > ------------------------------------- > > Key: ACCUMULO-3774 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3774 > Project: Accumulo > Issue Type: Sub-task > Environment: Hadoop 2.7.0, ZK 3.4.6, Accumulo 83d1b8388ad807d678c9a3a922e5025faa9a5933, 20 node m3.large EC2 cluster > Reporter: Keith Turner > Assignee: Eric Newton > Priority: Blocker > Labels: 1.7.0_QA > Fix For: 1.8.0 > > Attachments: ACCUMULO-3774-01.patch > > > I started CI running against 1.7.0-SNAP. After CI ran for while I started agitation. Then everything froze up. The root tablet node was killed, the root tablet had a lot of walogs (will open a seperate issue for this), the root tablet was reloaded on another machine. However it hung up while loading with the following issue. The minor compaction after recovery was trying to write to the root tablet. This happened before the root tablet location was set. > {noformat} > "Minor compacting +r<<" daemon prio=10 tid=0x00000000046cd800 nid=0x3508 in Object.wait() [0x00007fb0ac3b1000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.waitRTE(TabletServerBatchWriter.java:459) > at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:352) > - locked <0x000000078d154840> (a org.apache.accumulo.core.client.impl.TabletServerBatchWriter) > at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54) > at org.apache.accumulo.server.util.MetadataTableUtil.markLogUnused(MetadataTableUtil.java:1131) > at org.apache.accumulo.tserver.TabletServer.markUnusedWALs(TabletServer.java:3032) > at org.apache.accumulo.tserver.TabletServer.minorCompactionFinished(TabletServer.java:2917) > at org.apache.accumulo.tserver.tablet.DatafileManager.bringMinorCompactionOnline(DatafileManager.java:440) > at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:956) > at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:84) > at org.apache.accumulo.tserver.tablet.Tablet.minorCompactNow(Tablet.java:1080) > at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2124) > at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$3.run(TabletServer.java:1510) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)