hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brush,Ryan" <RBR...@CERNER.COM>
Subject Re: NoRouteToHostException causes Master abort when the RegionServer hosting ROOT is not available
Date Fri, 01 Apr 2011 17:27:30 GMT
I've verified this was indeed caused by HBASE-3660, and it fixed the issue
in our environment. Thanks!


On 4/1/11 10:57 AM, "Stack" <stack@duboce.net> wrote:

>The below looks like HBASE-3660, 'HMaster will exit when starting with
>stale data in cached locations such as -ROOT- or .META.', included in
>0.90.2 RC.
>St.Ack
>
>On Fri, Apr 1, 2011 at 8:48 AM, Brush,Ryan <RBRUSH@cerner.com> wrote:
>> This happens in similar conditions but is distinct from HBASE-3617.
>>When the region hosting ROOT isn't available during restart, the
>>NoRouteToHostException propagates all the way up the call stack and
>>causes the master to abort.  It looks like this can be addressed by
>>handling NoRouteToHostException at some point and considering that
>>node/region server offline.
>>
>> I applied the patch from HBASE-3617 and it didn't fix the problem I'm
>>seeing, which I expected given the stack trace below.  Assuming this
>>reasoning is correct, does this merit a separate JIRA?  It does seem
>>critical in that the failure of a single node is preventing us from
>>being up our cluster.
>>
>> 2011-04-01 10:15:19,472 INFO
>>org.apache.hadoop.hbase.master.ServerManager: Exiting wait on
>>regionserver(s) to checkin; count=2, stopped=false, count of regions out
>>on cluster=0
>> 2011-04-01 10:15:19,486 INFO
>>org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
>>hdfs://iphadoop01:9000/hbase/.logs/iphadoop03.northamerica.cerner.net,600
>>20,1301665635981 belongs to an existing region server
>> 2011-04-01 10:15:19,486 INFO
>>org.apache.hadoop.hbase.master.MasterFileSystem: Log folder
>>hdfs://iphadoop01:9000/hbase/.logs/iphadoop05.northamerica.cerner.net,600
>>20,1301665659785 belongs to an existing region server
>> 2011-04-01 10:15:22,508 FATAL org.apache.hadoop.hbase.master.HMaster:
>>Unhandled exception. Starting shutdown.
>> java.net.NoRouteToHostException: No route to host
>>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>     at 
>>sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>     at 
>>org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.jav
>>a:206)
>>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>     at 
>>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseCl
>>ient.java:328)
>>     at 
>>org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:88
>>3)
>>     at 
>>org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>>     at 
>>org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>>     at $Proxy6.getProtocolVersion(Unknown Source)
>>     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>>     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>>     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>>     at 
>>org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>>     at 
>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati
>>on.getHRegionConnection(HConnectionManager.java:954)
>>     at 
>>org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(Catalo
>>gTracker.java:385)
>>     at 
>>org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnectio
>>n(CatalogTracker.java:211)
>>     at 
>>org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(C
>>atalogTracker.java:458)
>>     at 
>>org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:425
>>)
>>     at 
>>org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:
>>383)
>>     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
>> 2011-04-01 10:15:22,510 INFO org.apache.hadoop.hbase.master.HMaster:
>>Aborting
>> 2011-04-01 10:15:22,510 DEBUG org.apache.hadoop.hbase.master.HMaster:
>>Stopping service threads
>>
>> ----------------------------------------------------------------------
>> CONFIDENTIALITY NOTICE This message and any included attachments are
>>from Cerner Corporation and are intended only for the addressee. The
>>information contained in this message is confidential and may constitute
>>inside or non-public information under international, federal, or state
>>securities laws. Unauthorized forwarding, printing, copying,
>>distribution, or use of such information is strictly prohibited and may
>>be unlawful. If you are not the addressee, please promptly delete this
>>message and notify the sender of the delivery error by e-mail or you may
>>call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)
>>(816)221-1024.
>>


Mime
View raw message