hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-10785) Metas own location should be cached
Date Wed, 09 Apr 2014 15:06:19 GMT

     [ https://issues.apache.org/jira/browse/HBASE-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Enis Soztutar updated HBASE-10785:

    Attachment: hbase-10785_v3.patch

Attaching v3 patch which removes the conf param, and cache clear on useCache==false. 

bq. Is there code duplication in locateMeta? If so, does there have to be (no biggie.. just
There is some between locateMeta and locateRegionInMeta, but adding this logic in locateRegionInMeta
would make it even more complex. I think it should be fine. 
bq. I know it's a copy paste; but I don't think we should do that: often the second try is
w/o cache to be sure, but trashing the cache for the others is bad, as the default for a second
try is nearly always to bypass the cache
I think we can go either ways. The nice part about removing from cache is that one thread
already knows that the location that is cached is no good, so it just removes it so that other
threads will wait for this to finish the lookup of the new location. On some cases, this will
save unnecessary trips to the bad location (and possible socket timeouts), while on other
cases, a retry from a thread will stall the other lookup. v3 patch removes this cache invalidation

> Metas own location should be cached
> -----------------------------------
>                 Key: HBASE-10785
>                 URL: https://issues.apache.org/jira/browse/HBASE-10785
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: hbase-10070
>         Attachments: hbase-10785_v1.patch, hbase-10785_v2.patch, hbase-10785_v3.patch
> With ROOT table gone, we no longer cache the location of the meta table (in MetaCache)
in 96+. I've checked 94 code, and there we cache meta, but not root.
> However, not caching the metas own location means that we are doing a zookeeper request
every time we want to look up a regions location from meta. This means that there is a significant
spike in zk requests whenever a region server goes down. 
> This affects trunk,0.98 and 0.96 as well as hbase-10070 branch. I've discovered the issue
in hbase-10070 because of the integration test (HBASE-10572) results in 150K requests to zk
in 10min. 
> A thread dump from one of the runs have 100+ threads from client in this stack trace:

> 	{code}
> 	"TimeBoundedMultiThreadedReaderThread_20" prio=10 tid=0x00007f852c2f2000 nid=0x57b6
in Object.wait() [0x00007f85059e7000]
> 	   java.lang.Thread.State: WAITING (on object monitor)
> 		at java.lang.Object.wait(Native Method)
> 		at java.lang.Object.wait(Object.java:503)
> 		at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
> 		- locked <0x00000000ea71aa78> (a org.apache.zookeeper.ClientCnxn$Packet)
> 		at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149)
> 		at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
> 		at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684)
> 		at org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1853)
> 		at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:186)
> 		at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
> 		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)
> 		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1112)
> 		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1220)
> 		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1129)
> 		at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:321)
> 		at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.call(RpcRetryingCallerWithReadReplicas.java:257)
> 		- locked <0x00000000e9bcf238> (a org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas)
> 		at org.apache.hadoop.hbase.client.HTable.get(HTable.java:818)
> 		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.queryKey(MultiThreadedReader.java:288)
> 		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.readKey(MultiThreadedReader.java:249)
> 		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.runReader(MultiThreadedReader.java:192)
> 		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.run(MultiThreadedReader.java:150)
> 	{code}

This message was sent by Atlassian JIRA

View raw message