Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 22BAE172B0 for ; Wed, 22 Apr 2015 19:22:01 +0000 (UTC) Received: (qmail 81885 invoked by uid 500); 22 Apr 2015 19:22:00 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 81854 invoked by uid 500); 22 Apr 2015 19:22:00 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 81842 invoked by uid 99); 22 Apr 2015 19:22:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Apr 2015 19:22:00 +0000 Date: Wed, 22 Apr 2015 19:22:00 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (ACCUMULO-3745) deadlock in SourceSwitchingIterator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Turner updated ACCUMULO-3745: ----------------------------------- Attachment: ACCUMULO-3745-1.patch [~ecn] and I sat together and created this patch. We were not sure how to test. We both visually analysed all locking to ensure it was done in the same order. > deadlock in SourceSwitchingIterator > ----------------------------------- > > Key: ACCUMULO-3745 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3745 > Project: Accumulo > Issue Type: Bug > Components: tserver > Affects Versions: 1.6.1 > Environment: Large production cluster, with complex iterator trees. > Reporter: Eric Newton > Priority: Blocker > Fix For: 1.5.3, 1.7.0, 1.6.3 > > Attachments: ACCUMULO-3745-1.patch > > > Details come from an offline cluster, so it's difficult to reproduce the exact details. A very complex iterator was running over tablet. "deepCopy" may have been called a couple dozen times, which may have contributed to the problem. > Relevant facts: > A scan and a minor compaction created a deadlock which was detected by the java runtime. > {noformat} > "Query... ": > waiting to lock monitor 0x1234 (object 0x1234, a java.util.Collections$SynchronizedRandomAccessList), > which is held by "minor compactor 1" > "minor compactor 1": > waiting to lock monitor 0x9876 (object 0x9876, a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator), > which is held by "Query..." > {noformat} > Java stacks: > {noformat} > "Query..." > at java.util.Collections@SynchronizedCollection.add(Collections.java:1636) > - waiting to lock <0x1234> (a java.util.Collections$SynchronizedRandomAccessList) > at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.(SourceSwitchingIterator.java:72) > at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.deepCopy(SourceSwitchingIterator:85) > - locked <0x9876> (a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator) > ... PartialMutationSkippingIterator.deepCopy(InMememoryMap.java:113) > ... InMemoryMap#MemoryIterator.deepCopy(InnMemoryMap.java:623) > ... > {noformat} > and: > {noformat} > "minor compactor 1": > at org.apache.accumulo.core.iterators.system.SourceSwitchingIterarot._switchNow(SourceSwitchingIterator:171) > - waiting to lock <0x9876> (a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator) > at org.apache.accumulo.iterators.system.SourceSwitchingIterator.switchNow(SourceSwitchingIterator.java:184) > locked <0x1234> (a java.util.Collections#SynhronizedRandomAccessList) > at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.switchNow(InMemoryMap.java:647) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)