hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subbu M Iyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3654) Weird blocking between getOnlineRegion and createRegionLoad
Date Wed, 23 Mar 2011 04:11:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009991#comment-13009991
] 

Subbu M Iyer commented on HBASE-3654:
-------------------------------------

1. Created a dummy class with two API's. getOnlineRegions and buildServerReport, which exactly
mimic our HREgionServer.getOnlineRegions and HRS.buildServerLoad.
(i.e, both operate on a Hashmap under a sync lock)
 
2. Created 20 threads with 10 threads hitting getOnlineRegions and 10 hitting buildServerLoad
in a
loops for 100 times. (just to simulate and recreate the locked reader's scenario that JD reported)
 
3. Ran the test and captured the thread dump for the following scenarios with onlineRegions
   represented as:
                a. HashMap, and Synchronized on HashMap (as it is today)
                b. ConcurrentHashMap with no synchronization.
                c. ConcurrentSkipListMap with no sync
                d. CopyOnWriteList
               
4. I could reproduce the lock scenario that JD reported in all the scenarions 3a,3b,and 3c.
   in case of 3c I do seeblocked threads waiting at 
at java.util.concurrent.ConcurrentSkipListMap$EntryIterator.next and in case of 3b at
at java.util.concurrent.ConcurrentHashMap$EntryIterator.next(ConcurrentHashMap.java:1163)
   and case of 3d has no blocked thread except for one thread
   blocked at at java.util.Arrays.copyOf(Arrays.java:2760) during getOnlineRegions call.
  
5. I have attached all the thread dumps for your review.  

> Weird blocking between getOnlineRegion and createRegionLoad
> -----------------------------------------------------------
>
>                 Key: HBASE-3654
>                 URL: https://issues.apache.org/jira/browse/HBASE-3654
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.90.2
>
>
> Saw this when debugging something else:
> {code}
> "regionserver60020" prio=10 tid=0x00007f538c1c0000 nid=0x4c7 runnable [0x00007f53931da000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1380)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:916)
> 	- locked <0x0000000672aa0a00> (a java.util.concurrent.ConcurrentSkipListMap)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:767)
> 	- locked <0x0000000656f62710> (a java.util.HashMap)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:722)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
> 	at java.lang.Thread.run(Thread.java:662)
> "IPC Reader 9 on port 60020" prio=10 tid=0x00007f538c1be000 nid=0x4c6 waiting for monitor
entry [0x00007f53932db000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
> 	- waiting to lock <0x0000000656f62710> (a java.util.HashMap)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> 	- locked <0x0000000656e60068> (a org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> ...
> "IPC Reader 0 on port 60020" prio=10 tid=0x00007f538c08b000 nid=0x4bd waiting for monitor
entry [0x00007f5393be4000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getFromOnlineRegions(HRegionServer.java:2295)
> 	- waiting to lock <0x0000000656f62710> (a java.util.HashMap)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getOnlineRegion(HRegionServer.java:2307)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2333)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.isMetaRegion(HRegionServer.java:379)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:422)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer$QosFunction.apply(HRegionServer.java:361)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer.getQosLevel(HBaseServer.java:1126)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:982)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> 	at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> 	- locked <0x0000000656e635c8> (a org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> All the readers are blocked! I have the feeling something much better could be done.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message