Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7451817F4E for ; Wed, 4 Mar 2015 04:56:05 +0000 (UTC) Received: (qmail 78698 invoked by uid 500); 4 Mar 2015 04:56:03 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 78624 invoked by uid 500); 4 Mar 2015 04:56:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 78610 invoked by uid 99); 4 Mar 2015 04:56:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2015 04:56:02 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.223.172 as permitted sender) Received: from [209.85.223.172] (HELO mail-ie0-f172.google.com) (209.85.223.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Mar 2015 04:55:58 +0000 Received: by iecrd18 with SMTP id rd18so64486064iec.5 for ; Tue, 03 Mar 2015 20:55:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=2SNZS0PXuaS9tTH+8z042/oTOLZISYBm9h6vsa6ONCw=; b=0xN7sxgg35mS1EPiRhz5wF+1rf5u46RVQVfm9M3b+/vj0o0AYS47RvEeCu68j/WhHv bbXGKt8nYUiBCzmiLXX4La+7Wr5JCTs4mHJ/B6vZ7VC6SZh57U71uyKn3JucmE6gafqX xu0f73FxWGQc7Rc+tiZwUJmM5w0LQNtKRpWcwgJhyIFpezDBwXSPkkXHsk/Q81Gw+fxL VrgQwxal+uMMgaStM0XkNLozCBq6zKztshufceHuiNyuqEJnNMZI1+F4PBSaWTRPu/EC Bn2lp48hacPdyPCLNX4RukiOoC3ygPWjuQXbHEY20QAuSohgvcGRNUbEVHEsZ3U85irL m15Q== MIME-Version: 1.0 X-Received: by 10.107.137.226 with SMTP id t95mr8037156ioi.10.1425444937960; Tue, 03 Mar 2015 20:55:37 -0800 (PST) Received: by 10.36.53.82 with HTTP; Tue, 3 Mar 2015 20:55:37 -0800 (PST) In-Reply-To: References: Date: Tue, 3 Mar 2015 20:55:37 -0800 Message-ID: Subject: Re: Where is HBase failed servers list stored From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113ec7b62c92d805106f43e1 X-Virus-Checked: Checked by ClamAV on apache.org --001a113ec7b62c92d805106f43e1 Content-Type: text/plain; charset=UTF-8 Please see HBASE-13067 Fix caching of stubs to allow IP address changes of restarted remote servers Cheers On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L wrote: > Hi nkeywal, > While trying to get more details about this issue I got to know that > HMaster is trying to connect to wrong IP Address. > Here is exact issue: > Due to some unavoidable reason we are forced to change IP Address of > regionsserver & then updated new IP Address in /etc/hosts file across all > HBase servers. I started RegionServer from master with start-hbase.sh > scripts & jps output in regionserver shows it's(regionserver process) up > and running. > But when running hbase balancer HMaster is trying to connect to old IP > Address instead of new IP Address. > One more thing here is when I checked regionserver status on 60010 port > its showing as up and running. > Thanks,Sandeep. > > > From: nkeywal@gmail.com > > Date: Tue, 3 Mar 2015 19:01:01 +0100 > > Subject: Re: Where is HBase failed servers list stored > > To: user@hbase.apache.org > > > > It's in local memory. When HBase cannot connect to a server, it puts it > > into the "failedServerList" for 2 seconds. This is to avoid having all > the > > threads going into a potentially long socket timeout. Are you sure that > you > > can connect from the master to this machine/port? > > > > You can change the time it stays in the list with > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should > not > > help. > > > > You should have another exception before this one in the logs (the one > that > > initially put this region server in this failedServerList). > > > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L > > wrote: > > > > > Hi, > > > While trying to run hbase balancer I am getting error message as "This > > > server is in the failed servers list".Due to this cluster is not > getting > > > balanced. > > > Even though regionserver is up and running hmaster is unable to > connect to > > > it. > > > The odd thing here is hmaster is able to start regionserver and it is > > > detected as up and running but unable to assign regions. > > > Can some one suggest any solution for this. > > > Following is full stack > > > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This > > > server is in the failed servers list: host1/192.168.2.20:60020 at > > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853) > > > at > > > > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543) > > > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) > at > > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) > > > at > > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) > > > at > > > > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964) > > > at > > > > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671) > > > at > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097) > > > at > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577) > > > at > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550) > > > at > > > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104) > > > at > > > > org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999) > > > at > > > > org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447) > > > at > > > > org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260) > > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) at > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > at > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > at java.lang.Thread.run(Thread.java:745) > > > Thanks,Sandeep. > > --001a113ec7b62c92d805106f43e1--