hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: found one deadlock on hbase?
Date Fri, 29 Apr 2011 03:48:28 GMT
Yes.  The below looks viable (though strange we have not seen it up to
this).  The profiler may have slowed things to bring on the deadlock
-- or the run up to the high water mark -- but its still a deadlock.
Please file a critical priority issue.

If you have a patch, that'd be excellent.

Thanks for digging in on this,
St.Ack


2011/4/28 Zhoushuaifeng <zhoushuaifeng@huawei.com>:
> Thanks, I will do more test.
> Maybe the deadlock hapened like this? Please point it out if it's wrong.
>
> 1,One handler is handling put op, and reclaimMemStoreMemory, but the memory is isAboveHighWaterMark,
so this handler locked the memstoreflusher until global mem is lower:
>
> public synchronized void reclaimMemStoreMemory() {
>    if (isAboveHighWaterMark()) {
>      lock.lock();
>      try {
>        while (isAboveHighWaterMark() && !server.isStopped()) {
>          wakeupFlushThread();
>          try {
>            // we should be able to wait forever, but we've seen a bug where
>            // we miss a notify, so put a 5 second bound on it at least.
>            flushOccurred.await(5, TimeUnit.SECONDS);
>          } catch (InterruptedException ie) {
>            Thread.currentThread().interrupt();
>          }
>        }
>      } finally {
>        lock.unlock();
>
> 2, flushforGlobalPressure is trigered, but to flush the memstore, it needed to lock the
memstoreflusher:
>
>  private boolean flushRegion(final HRegion region, final boolean emergencyFlush) {
>    synchronized (this.regionsInQueue) {
>      FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>      if (fqe != null && emergencyFlush) {
>        // Need to remove from region from delay queue.  When NOT an
>        // emergencyFlush, then item was removed via a flushQueue.poll.
>        flushQueue.remove(fqe);
>     }
>     lock.lock();
>    }
>
> 3, because lock is locked by the ipchandler of put op, the flushRegion will never get
the lock and flush will never happen.
> 4, no flush, memory stay in AboveHighWaterMark state, and never unlock, so, deadlock
happend.
>
> Is it right?
>
> Zhou Shuaifeng(Frank)
>
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年4月29日 3:09
> 收件人: user@hbase.apache.org
> 主题: Re: found one deadlock on hbase?
>
> Like I said in the previous thread you made about this issue, it seems
> that the YourKit profiler is doing something unexpected from the HBase
> POV. Can you try running without it and see if it still happens?
>
> J-D
>
> 2011/4/28 Zhoushuaifeng <zhoushuaifeng@huawei.com>:
>> Thanks, version is 0.90.1
>>
>> Zhou Shuaifeng(Frank)
>>
>> -----邮件原件-----
>> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
>> 发送时间: 2011年4月28日 13:10
>> 收件人: user@hbase.apache.org
>> 抄送: Yanlijun
>> 主题: Re: found one deadlock on hbase?
>>
>> Must be a deadlock if the dumb JVM can figure it out.  What version of
>> hbase please so I can dig into source code?
>> Thanks,
>> St.Ack
>>
>>
>

Mime
View raw message