hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lu, Wei" <...@microstrategy.com>
Subject RE: number of region servers is wrong
Date Fri, 24 Feb 2012 01:20:05 GMT
Thank you very much for the suggestions. Actually I asked Admin of the cluster to add ip and
hostnames to /etc/hosts on each region server and also master machine, and that works. It
seems to be a problem related to reverse DNS. 

Regards,
Wei

-----Original Message-----
From: tvinod@socialyantra.com [mailto:tvinod@socialyantra.com] On Behalf Of T Vinod Gupta
Sent: Friday, February 24, 2012 7:47 AM
To: user@hbase.apache.org
Subject: Re: number of region servers is wrong

i remember seeing this error in our deployment as well.. can you check your
gc logs to see if there are long gc times. also look at your zookeeper logs
to see whats going on..
i tried bunch of things, so not sure what worked. but what i did was
increase zookeeper connections and timeout limits and that did the trick
IIRC.

thanks

On Wed, Feb 22, 2012 at 10:55 PM, Lu, Wei <wlu@microstrategy.com> wrote:

> Hi,
>
> I met with a weird problem when using HBase. There are 3 machines: 1
> master and  2 region servers (wlu-rs1/10.27.17.251 and wlu-rs2/10.27.16.11
> ).
> But when I use "status 'detailed'" to see region servers' status, it show
> there are three server, and one server appears twice (exactly same).
> 3 live servers
> 10.27.17.251:60020 1329975187706
> 10.27.16.11:60020 1329975209046
> 10.27.17.251:60020 1329975187706
>
> When balance begins, region server 10.27.17.251 seems to move data from &
> to itself, and FATAL error occurs.
>
> Log info of HMaster:
>
> 2012-02-23 00:01:00,629 INFO org.apache.hadoop.hbase.master.HMaster:
> balance
> hri=usertable,user172022781,1329972455493.943849e136aa6f7a343d47fed57da429.,
> src=wlu-rs1,60020,1329968056162, dest=10.27.17.251,60020,1329968056162
> 2012-02-23 00:01:00,629 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
> region
> usertable,user172022781,1329972455493.943849e136aa6f7a343d47fed57da429.
> (offlining)
> 2012-02-23 00:01:09,712 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned
> node: /hbase/unassigned/ad483f3806a03756f3f47cd8bd220d09
> (region=usertable,user819517397,1329972500402.ad483f3806a03756f3f47cd8bd220d09.,
> server=wlu-rs1,60020,1329968056162, state=RS_ZK_REGION_CLOSING)
> 2012-02-23 00:01:09,712 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_CLOSING, server=wlu-rs1,60020,1329968056162,
> region=ad483f3806a03756f3f47cd8bd220d09
> 2012-02-23 00:01:12,678 FATAL org.apache.hadoop.hbase.master.HMaster:
> Remote unexpected exception
> java.io.IOException: Call to /10.27.17.251:60020 failed on local
> exception: java.io.EOFException
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806)
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
>                at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>                at $Proxy6.closeRegion(Unknown Source)
>                at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:601)
>                at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1123)
>                at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1070)
>                at
> org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1930)
>                at
> org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:694)
>                at
> org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:585)
>                at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.io.EOFException
>                at java.io.DataInputStream.readInt(Unknown Source)
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
> 2012-02-23 00:01:12,680 INFO org.apache.hadoop.hbase.master.HMaster:
> Aborting
> 2012-02-23 00:01:12,680 INFO org.apache.hadoop.hbase.master.HMaster:
> balance
>
>
> I use HBase0.90.3 and Hadoop0.20.2. Can anyone please help to figure this
> out?
>
>
>
> Regards,
> Wei
>
>

Mime
View raw message