accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3745) deadlock in SourceSwitchingIterator
Date Wed, 22 Apr 2015 18:52:00 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507658#comment-14507658
] 

Eric Newton commented on ACCUMULO-3745:
---------------------------------------

The two locks that are mutually held are:

* the synchronization around {{copies}}, a synchronized list.
* the lock on the SourceSwitching iterator

The SourceSwitchingIterator adds itself to copies in the constructor (which, isn't the best
form, but ignoring that for the moment). So, an implicit lock on the iterator while it is
being initialized, means that the lock order is this, then copies.

But the call to switch now locks copies, then _switchNow() locks this.

There are two possible fixes:

# don't lock copies and call _switchNow: make a copy (under a lock), and then call _switchNow
# move the synchronized block to switchNow, and remove it from _switchNow


> deadlock in SourceSwitchingIterator
> -----------------------------------
>
>                 Key: ACCUMULO-3745
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3745
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Large production cluster, with complex iterator trees.
>            Reporter: Eric Newton
>            Priority: Blocker
>             Fix For: 1.7.0, 1.6.3
>
>
> Details come from an offline cluster, so it's difficult to reproduce the exact details.
 A very complex iterator was running over tablet. "deepCopy" may have been called a couple
dozen times, which may have contributed to the problem.
> Relevant facts:
> A scan and a minor compaction created a deadlock which was detected by the java runtime.
> {noformat}
> "Query... ":
>   waiting to lock monitor 0x1234 (object 0x1234, a java.util.Collections$SynchronizedRandomAccessList),

>   which is held by "minor compactor 1"
> "minor compactor 1":
>  waiting to lock monitor 0x9876 (object 0x9876, a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator),

>  which is held by "Query..."
> {noformat}
> Java stacks:
> {noformat}
> "Query..."
>   at java.util.Collections@SynchronizedCollection.add(Collections.java:1636)
>   - waiting to lock <0x1234> (a java.util.Collections$SynchronizedRandomAccessList)
>   at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.<init>(SourceSwitchingIterator.java:72)
>  at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.deepCopy(SourceSwitchingIterator:85)
>  - locked <0x9876> (a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator)
>   ... PartialMutationSkippingIterator.deepCopy(InMememoryMap.java:113)
>  ... InMemoryMap#MemoryIterator.deepCopy(InnMemoryMap.java:623)
>  ...
> {noformat}
> and:
> {noformat}
> "minor compactor 1":
>  at org.apache.accumulo.core.iterators.system.SourceSwitchingIterarot._switchNow(SourceSwitchingIterator:171)
>  - waiting to lock <0x9876> (a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator)
>  at org.apache.accumulo.iterators.system.SourceSwitchingIterator.switchNow(SourceSwitchingIterator.java:184)
>  locked <0x1234> (a java.util.Collections#SynhronizedRandomAccessList)
>  at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.switchNow(InMemoryMap.java:647)
>  ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message