accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3471) Adding a new tserver puts some tables offline for few minutes
Date Wed, 14 Jan 2015 06:15:35 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276542#comment-14276542
] 

Josh Elser commented on ACCUMULO-3471:
--------------------------------------

Looking at the code and jstack of a tserver, I think I just convinced myself the numbers were
moving faster. Practically all of the threads are stuck trying to get the recovery lock instead
of actually doing anything. Then, like Denis said, there's one assignment updating the metadata
table.

{noformat}
"tablet assignment 1" daemon prio=10 tid=0x000000000269e800 nid=0x5bc8 in Object.wait() [0x00007f47b9a2c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.waitRTE(TabletServerBatchWriter.java:438)
        at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:340)
        - locked <0x0000000605fe16f8> (a org.apache.accumulo.core.client.impl.TabletServerBatchWriter)
        at org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54)
        at org.apache.accumulo.server.master.state.MetaDataStateStore.setLocations(MetaDataStateStore.java:80)
        at org.apache.accumulo.server.master.state.TabletStateStore.setLocation(TabletStateStore.java:83)
        at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2143)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:61)
        at org.apache.accumulo.core.trace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.accumulo.core.trace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:745)

"tablet assignment 2" daemon prio=10 tid=0x0000000002118000 nid=0x5bca waiting on condition
[0x00007f47b9c2e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000061aff0998> (a java.util.concurrent.locks.ReentrantLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
        at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:229)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at org.apache.accumulo.tserver.TabletServer.acquireRecoveryMemory(TabletServer.java:2201)
        at org.apache.accumulo.tserver.TabletServer.access$2600(TabletServer.java:246)
        at org.apache.accumulo.tserver.TabletServer$AssignmentHandler.run(TabletServer.java:2118)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at org.apache.accumulo.tserver.ActiveAssignmentRunnable.run(ActiveAssignmentRunnable.java:61)
        at org.apache.accumulo.core.trace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.accumulo.core.trace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:745)
{noformat}



> Adding a new tserver puts some tables offline for few minutes
> -------------------------------------------------------------
>
>                 Key: ACCUMULO-3471
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3471
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 12.04
>            Reporter: Denis Petrov
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: ACCUMULO-3471-balance-test.patch
>
>
> I run an Accumulo cluster with 15 tservers with about 6000 tablets on each (disks are
quite slow - each node has 2*4Tb SATA)
> When a new tserver added to the cluster, the rebalancing procedure starts.
> During this procedure some tablets are offline and unreachable during 5-10 minutes.
> It is visible in http://monitor:50095/tables and by timeouts on client side.
> The rebalancing caused by killing a tserver converges much faster then rebalancing caused
by adding a tserver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message