hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12554) TestBaseLoadBalancer may timeout due to lengthy rack lookup
Date Fri, 21 Nov 2014 19:05:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221275#comment-14221275
] 

Ted Yu commented on HBASE-12554:
--------------------------------

bq. The 60seconds should be configurable
How about introducing a config parameter called 'hbase.ip.to.rack.determiner.timeout' whose
unit is milliseconds ?
Do you think 60 seconds are an acceptable default ?

bq. Does the cancel actually interrupt the ongoing lookup or does it leave it hanging?
The ongoing lookup would be interrupted. See:
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html#cancel(boolean)

bq. Who cares about a lookup in test?
Considering the timeout parameter introduced above, the test can set the timeout to 10 milliseconds
(very low value).
What do you think ?

> TestBaseLoadBalancer may timeout due to lengthy rack lookup
> -----------------------------------------------------------
>
>                 Key: HBASE-12554
>                 URL: https://issues.apache.org/jira/browse/HBASE-12554
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 12554-v1.txt
>
>
> Here is one of the recent occurrences (https://builds.apache.org/job/PreCommit-HBASE-Build/11778/console):
> {code}
> testImmediateAssignment(org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer)
 Time elapsed: 30.019 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30000 milliseconds
> 	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> 	at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
> 	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
> 	at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:1162)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:1098)
> 	at java.net.InetAddress.getByName(InetAddress.java:1048)
> 	at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:561)
> 	at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:578)
> 	at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
> 	at org.apache.hadoop.hbase.master.RackManager.getRack(RackManager.java:66)
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:273)
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:1113)
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.randomAssignment(BaseLoadBalancer.java:1175)
> 	at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.immediateAssignment(BaseLoadBalancer.java:1145)
> 	at org.apache.hadoop.hbase.master.balancer.TestBaseLoadBalancer.testImmediateAssignment(TestBaseLoadBalancer.java:136)
> {code}
> One possible fix is to submit CachedDNSToSwitchMapping.resolve() to executor pool for
execution. RackManager.getRack() can set a timeout beyond which UNKNOWN_RACK is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message