hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
Date Thu, 16 Jul 2015 05:02:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629195#comment-14629195
] 

Hadoop QA commented on HBASE-13971:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12745564/13971-v1.txt
  against master branch at commit 6c6c7c51f6bd31af1fa99e3d76ab54a7613c4086.
  ATTACHMENT ID: 12745564

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified
tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions
(2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of
protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the total number
of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14794//testReport/
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14794//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14794//artifact/patchprocess/checkstyle-aggregate.html

  Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14794//console

This message is automatically generated.

> Flushes stuck since 6 hours on a regionserver.
> ----------------------------------------------
>
>                 Key: HBASE-13971
>                 URL: https://issues.apache.org/jira/browse/HBASE-13971
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.3.0
>         Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows
on cluster with 32 region servers each with max heap size of 24GBs.
>            Reporter: Abhilash
>            Assignee: Ted Yu
>            Priority: Critical
>         Attachments: 13971-v1.txt, 13971-v1.txt, 13971-v1.txt, jstack.1, jstack.2, jstack.3,
jstack.4, jstack.5, rsDebugDump.txt, screenshot-1.png
>
>
> One region server stuck while flushing(possible deadlock). Its trying to flush two regions
since last 6 hours (see the screenshot).
> Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs
and 100 back references. ~37 Million writes on each regionserver till now but no writes happening
on any regionserver from past 6 hours  and their memstore size is zero(I dont know if this
is related). But this particular regionserver has memstore size of 9GBs from past 6 hours.
> Relevant snaps from debug dump:
> Tasks:
> ===========================================================
> Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd.
> Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd
> Running for 22034s
> Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390.
> Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390
> Running for 22033s
> Executors:
> ===========================================================
> ...
> Thread 139 (MemStoreFlusher.1):
>   State: WAITING
>   Blocked count: 139711
>   Waited count: 239212
>   Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>     java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>     org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
>     org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
>     org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
>     org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     java.lang.Thread.run(Thread.java:745)
> Thread 137 (MemStoreFlusher.0):
>   State: WAITING
>   Blocked count: 138931
>   Waited count: 237448
>   Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76
>   Stack:
>     sun.misc.Unsafe.park(Native Method)
>     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>     java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>     org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305)
>     org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
>     org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
>     org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902)
>     org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75)
>     org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>     java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message