hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10882) Bulkload process hangs on regions randomly and finally throws RegionTooBusyException
Date Tue, 17 Jun 2014 04:49:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033434#comment-14033434
] 

stack commented on HBASE-10882:
-------------------------------

[~victorunique] Yes, that is an odd looking stack trace. I agree it looks like "NO ONE OWNED
THE LOCK".   I opened HBASE-11368 because this issue comes up from time to time.

> Bulkload process hangs on regions randomly and finally throws RegionTooBusyException
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-10882
>                 URL: https://issues.apache.org/jira/browse/HBASE-10882
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.10
>         Environment: rhel 5.6, jdk1.7.0_45, hadoop-2.2.0-cdh5.0.0
>            Reporter: Victor Xu
>         Attachments: jstack_5105.log
>
>
> I came across the problem in the early morning several days ago. It happened when I used
hadoop completebulkload command to bulk load some hdfs files into hbase table. Several regions
hung and after retried three times they all threw RegionTooBusyExceptions. Fortunately, I
caught one of the exceptional region’s HRegionServer process’s jstack info just in time.
> I found that the bulkload process was waiting for a write lock:
> at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
> The lock id is 0x00000004054ecbf0.
> In the meantime, many other Get/Scan operations were also waiting for the same lock id.
And, of course, they were waiting for the read lock:
> at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:873)
> The most ridiculous thing is NO ONE OWNED THE LOCK! I searched the jstack output carefully,
but cannot find any process who claimed to own the lock.
> When I restart the bulk load process, it failed at different regions but with the same
RegionTooBusyExceptions. 
> I guess maybe the region was doing some compactions at that time and owned the lock,
but I couldn’t find compaction info in the hbase-logs.
> Finally, after several days’ hard work, the only temporary solution to this problem
was found, that is TRIGGERING A MAJOR COMPACTION BEFORE THE BULKLOAD, 
> So which process owned the lock? Has anyone came across the same problem before?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message