hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HMaster won't die waiting for RegionServer that is already dead
Date Sat, 22 Jan 2011 00:52:11 GMT
Write it up James.  Others will probably trip on it too.
Good stuff,

On Fri, Jan 21, 2011 at 4:44 PM, James Kennedy <james.kennedy@troove.net> wrote:
> Aha that stupid dot!
> My /etc/hosts file looks pretty standard:
> localhost
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> However look what I found in the data-seed-specific hbase-site.xml
> <property>
>        <name>hbase.master.dns.interface</name>
>        <value>lo</value>
> </property>
> <property>
>       <name>hbase.regionserver.dns.interface</name>
>       <value>lo</value>
> </property>
> Not sure why we had that in there originally but taking it out fixes the problem. Both
sides now resolve hregioninfo to "localhost" instead of "localhost.".  I have no idea how
specifying the lo interface adds a period to the localhost name but that sounds like a bug
to me. Shall I report it or is this a known issue?
> Thanks for your help,
> James Kennedy
> Project Manager
> Troove Inc.
> On 2011-01-21, at 1:34 PM, Jean-Daniel Cryans wrote:
>> There's some sort of mismatch:
>> RegionServer ephemeral node deleted, processing expiration
>> [localhost.,60020,1295592845214]
>> and
>> Waiting on regionserver(s) to go down localhost,60020,1295592845214
>> Do you see the dot after "localhost" in the first line? I wonder how
>> it got different in the znode and in ServerManager.onlineServers... In
>> any case, I'm pretty sure you can get it working by playing with your
>> /etc/hosts
>> J-D
>> On Thu, Jan 20, 2011 at 11:28 PM, James Kennedy
>> <james.kennedy@troove.net> wrote:
>>> I've come across a strange bug that I'm having trouble debugging.
>>> Basically I have a seed application that is executed via maven and runs a
>>> single JVM ApplicationStarter that starts up hdfs, regionserver, hmaster
>>> threads. It does some seeding then shuts those down in reverse order.
>>> So this isn't a typical way of running hbase to be sure. However it has
>>> always worked until I upgraded to HBase 0.90.0.
>>> I didn't notice it when I was originally testing 0.90.0 because it only
>>> seems to be happening on our EC2.small build server node when I run this
>>> particular seeder.
>>> Running the same thing locally on my mac works.
>>> Attached is the error output starting from when the HRegionServer.stop() is
>>> called to when HMaster.shutdown() is called and it starts looping forever in
>>> letRegionServersShutdown().
>>> It looks like RegionServerTracker is getting to "RegionServer ephemeral node
>>> deleted, processing expiration" but then because it can't get the
>>> HServerInfo it doesn't follow-through with actually expiring it.
>>> Does anyone have any ideas as to why this might be happening?
>>> Thanks,
>>> James Kennedy
>>> Project Manager
>>> Troove Inc.

View raw message