hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Tight loop connecting to master
Date Wed, 01 Jun 2011 02:59:49 GMT
Like you say, it should be gone in 0.92.x.

On each regionserver report, we'd deserialize an HServerAddress
instance.  As part of deserialize, we'd make an InetSocketAddress
instance.  This act of creation would do a resolve.  In HSA
constructor, if InetSocketAddress failed resolve, we'd throw the below
IllegalArgumentException.

Not sure what you can do about it in 0.90.x w/o major surgery.  I
suppose you could just catch the exception and drop the report on the
ground until resolve works again.

St.Ack

On Tue, May 31, 2011 at 7:21 PM, Todd Lipcon <todd@cloudera.com> wrote:
> We had a QA cluster which got left on for a while during some
> maintenance to DNS/etc in our colo... everything is fine in the RS
> logs until:
>
> 2011-05-14 23:11:46,154 ERROR org.apache.hadoop.hbase.HServerAddress:
> Could not resolve the DNS name of c0505.hal.cloudera.com:60000
> 2011-05-14 23:11:46,154 WARN
> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
> java.lang.IllegalArgumentException: Could not resolve the DNS name of
> c0505.hal.cloudera.com:60000
>        at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
>        at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:66)
>        at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.getMasterAddress(HRegionServer.java:1469)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1442)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:742)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:591)
>        at java.lang.Thread.run(Thread.java:619)
> 2011-05-14 23:12:14,175 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
> to Master server at c0505.hal.cloudera.com:60000
> 2011-05-14 23:12:14,177 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
> master at c0505.hal.cloudera.com:60000
> 2011-05-14 23:12:14,178 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect
> to Master server at c0505.hal.cloudera.com:60000
> 2011-05-14 23:12:14,179 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to
> master at c0505.hal.cloudera.com:60000
> followed by many GB of the above two messages alternating.
>
> This is something close to an 0.90.1 plus a few patches here and
> there... this ring a bell for anyone or should I dig? Looks like in
> trunk it's mostly rewritten by HBASE-3827/HBASE-1502. I do have
> HBASE-3545 in the build.
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message