accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3774) Deadlock after recovering root tablet
Date Wed, 06 May 2015 17:18:00 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530957#comment-14530957
] 

Josh Elser commented on ACCUMULO-3774:
--------------------------------------

I think I just noticed this in our nightly test runs in a slightly different way. Tried to
run an {{accumulo admin stopAll}} which hung. All user tables are unloaded, but metadata and
root are still loaded with a stuck minc on metadata

{noformat}
"Minor compacting !0;~<" daemon prio=10 tid=0x0000000003e01800 nid=0x2c9f in Object.wait()
[0x00007fb2d1883000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000f1b6a6a8> (a org.apache.accumulo.core.client.impl.TabletServerBatchWriter)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.waitRTE(TabletServerBatchWriter.java:459)
        at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:352)
        - locked <0x00000000f1b6a6a8> (a org.apache.accumulo.core.client.impl.TabletServerBatchWriter)
        at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54)
        at org.apache.accumulo.server.util.MetadataTableUtil.markLogUnused(MetadataTableUtil.java:1134)
        at org.apache.accumulo.tserver.TabletServer.markUnusedWALs(TabletServer.java:3032)
        at org.apache.accumulo.tserver.TabletServer.minorCompactionFinished(TabletServer.java:2917)
        at org.apache.accumulo.tserver.tablet.DatafileManager.bringMinorCompactionOnline(DatafileManager.java:440)
        at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:956)
        at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:84)
        at org.apache.accumulo.tserver.tablet.Tablet.initiateClose(Tablet.java:1407)
        at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:1338)
        at org.apache.accumulo.tserver.TabletServer$UnloadTabletHandler.run(TabletServer.java:1963)
        at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - <0x00000000f0b7dd60> (a java.util.concurrent.ThreadPoolExecutor$Worker)
{noformat}

> Deadlock after recovering root tablet
> -------------------------------------
>
>                 Key: ACCUMULO-3774
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3774
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: Hadoop 2.7.0, ZK 3.4.6, Accumulo 83d1b8388ad807d678c9a3a922e5025faa9a5933,
20 node m3.large EC2 cluster
>            Reporter: Keith Turner
>            Priority: Blocker
>             Fix For: 1.7.0
>
>
> I started CI running against 1.7.0-SNAP.   After CI ran for while I started agitation.
  Then everything froze up.   The root tablet node was killed, the root tablet had a lot of
walogs (will open a seperate issue for this), the root tablet was reloaded on another machine.
 However it hung up while loading with the following issue.  The minor compaction after recovery
was trying to write to the root tablet.  This happened before the root tablet location was
set.
> {noformat}
> "Minor compacting +r<<" daemon prio=10 tid=0x00000000046cd800 nid=0x3508 in Object.wait()
[0x00007fb0ac3b1000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.waitRTE(TabletServerBatchWriter.java:459)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:352)
>         - locked <0x000000078d154840> (a org.apache.accumulo.core.client.impl.TabletServerBatchWriter)
>         at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54)
>         at org.apache.accumulo.server.util.MetadataTableUtil.markLogUnused(MetadataTableUtil.java:1131)
>         at org.apache.accumulo.tserver.TabletServer.markUnusedWALs(TabletServer.java:3032)
>         at org.apache.accumulo.tserver.TabletServer.minorCompactionFinished(TabletServer.java:2917)
>         at org.apache.accumulo.tserver.tablet.DatafileManager.bringMinorCompactionOnline(DatafileManager.java:440)
>         at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:956)
>         at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:84)
>         at org.apache.accumulo.tserver.tablet.Tablet.minorCompactNow(Tablet.java:1080)
>         at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2124)
>         at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$3.run(TabletServer.java:1510)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message