hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Kennedy <james.kenn...@troove.net>
Subject Re: HMaster won't die waiting for RegionServer that is already dead
Date Sat, 22 Jan 2011 00:44:55 GMT
Aha that stupid dot!

My /etc/hosts file looks pretty standard:

127.0.0.1 localhost

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

However look what I found in the data-seed-specific hbase-site.xml

<property>
        <name>hbase.master.dns.interface</name>
        <value>lo</value>
</property>
<property>
       <name>hbase.regionserver.dns.interface</name>
       <value>lo</value>
</property>

Not sure why we had that in there originally but taking it out fixes the problem. Both sides
now resolve hregioninfo to "localhost" instead of "localhost.".  I have no idea how specifying
the lo interface adds a period to the localhost name but that sounds like a bug to me. Shall
I report it or is this a known issue?

Thanks for your help,

James Kennedy
Project Manager
Troove Inc.

On 2011-01-21, at 1:34 PM, Jean-Daniel Cryans wrote:

> There's some sort of mismatch:
> 
> RegionServer ephemeral node deleted, processing expiration
> [localhost.,60020,1295592845214]
> 
> and
> 
> Waiting on regionserver(s) to go down localhost,60020,1295592845214
> 
> 
> Do you see the dot after "localhost" in the first line? I wonder how
> it got different in the znode and in ServerManager.onlineServers... In
> any case, I'm pretty sure you can get it working by playing with your
> /etc/hosts
> 
> J-D
> 
> On Thu, Jan 20, 2011 at 11:28 PM, James Kennedy
> <james.kennedy@troove.net> wrote:
>> I've come across a strange bug that I'm having trouble debugging.
>> Basically I have a seed application that is executed via maven and runs a
>> single JVM ApplicationStarter that starts up hdfs, regionserver, hmaster
>> threads. It does some seeding then shuts those down in reverse order.
>> So this isn't a typical way of running hbase to be sure. However it has
>> always worked until I upgraded to HBase 0.90.0.
>> I didn't notice it when I was originally testing 0.90.0 because it only
>> seems to be happening on our EC2.small build server node when I run this
>> particular seeder.
>> Running the same thing locally on my mac works.
>> Attached is the error output starting from when the HRegionServer.stop() is
>> called to when HMaster.shutdown() is called and it starts looping forever in
>> letRegionServersShutdown().
>> It looks like RegionServerTracker is getting to "RegionServer ephemeral node
>> deleted, processing expiration" but then because it can't get the
>> HServerInfo it doesn't follow-through with actually expiring it.
>> Does anyone have any ideas as to why this might be happening?
>> 
>> 
>> Thanks,
>> James Kennedy
>> Project Manager
>> Troove Inc.
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message