hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-10882) Bulkload process hangs on regions randomly and finally throws RegionTooBusyException
Date Tue, 01 Apr 2014 02:58:14 GMT
Victor Xu created HBASE-10882:

             Summary: Bulkload process hangs on regions randomly and finally throws RegionTooBusyException
                 Key: HBASE-10882
                 URL: https://issues.apache.org/jira/browse/HBASE-10882
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.94.10
         Environment: rhel 5.6, jdk1.7.0_45, hadoop-2.2.0-cdh5.0.0
            Reporter: Victor Xu

I came across the problem in the early morning several days ago. It happened when I used hadoop
completebulkload command to bulk load some hdfs files into hbase table. Several regions hung
and after retried three times they all threw RegionTooBusyExceptions. Fortunately, I caught
one of the exceptional region’s HRegionServer process’s jstack info just in time.
I found that the bulkload process was waiting for a write lock:
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
The lock id is 0x00000004054ecbf0.
In the meantime, many other Get/Scan operations were also waiting for the same lock id. And,
of course, they were waiting for the read lock:
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:873)
The most ridiculous thing is NO ONE OWNED THE LOCK! I searched the jstack output carefully,
but cannot find any process who claimed to own the lock.
When I restart the bulk load process, it failed at different regions but with the same RegionTooBusyExceptions.

I guess maybe the region was doing some compactions at that time and owned the lock, but I
couldn’t find compaction info in the hbase-logs.
Finally, after several days’ hard work, the only temporary solution to this problem was
So which process owned the lock? Has anyone came across the same problem before?

This message was sent by Atlassian JIRA

View raw message