hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Liochon <nkey...@gmail.com>
Subject Re: Where is HBase failed servers list stored
Date Thu, 05 Mar 2015 17:42:21 GMT
As Bryan.
Le 5 mars 2015 17:55, "Bryan Beaudreault" <bbeaudreault@hubspot.com> a
écrit :

> You should run with a backup master in a production cluster.  The failover
> process works very well and will cause no downtime.  I've done it literally
> hundreds of times across our multiple production hbase clusters.
>
> Even if you don't have a backup master, you should still be fine with
> restarting the master.  It can handle a brief blip without any problems,
> from what I've seen.  The master is really only used for coordination such
> as region moves, RS failovers, etc.  Your clients can still retrieve data
> from your regionservers, as long as no servers die in the brief moment you
> are masterless.
>
> On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy <sandeepvreddy@outlook.com>
> wrote:
>
> > Since ours is production cluster we cant restart master.
> > In our test cluster I tested this scenario, and it got resolved after
> > restarting master.
> > Other than restarting master I couldn't find any solution.
> > Thanks,Sandeep.
> >
> > > From: nkeywal@gmail.com
> > > Date: Wed, 4 Mar 2015 14:55:03 +0100
> > > Subject: Re: Where is HBase failed servers list stored
> > > To: user@hbase.apache.org
> > >
> > > If I understand the issue correctly, restarting the master should solve
> > the
> > > problem.
> > >
> > > On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Please see HBASE-13067 Fix caching of stubs to allow IP address
> > changes of
> > > > restarted remote servers
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sandeepvreddy@outlook.com
> >
> > > > wrote:
> > > >
> > > > > Hi nkeywal,
> > > > > While trying to get more details about this issue I got to know
> that
> > > > > HMaster is trying to connect to wrong IP Address.
> > > > > Here is exact issue:
> > > > > Due to some unavoidable reason we are forced to change IP Address
> of
> > > > > regionsserver & then updated new IP Address in /etc/hosts file
> > across all
> > > > > HBase servers. I started RegionServer from master with
> start-hbase.sh
> > > > > scripts & jps output in regionserver shows it's(regionserver
> > process) up
> > > > > and running.
> > > > > But when running hbase balancer HMaster is trying to connect to old
> > IP
> > > > > Address instead of new IP Address.
> > > > > One more thing here is when I checked regionserver status on 60010
> > port
> > > > > its showing as up and running.
> > > > > Thanks,Sandeep.
> > > > >
> > > > > > From: nkeywal@gmail.com
> > > > > > Date: Tue, 3 Mar 2015 19:01:01 +0100
> > > > > > Subject: Re: Where is HBase failed servers list stored
> > > > > > To: user@hbase.apache.org
> > > > > >
> > > > > > It's in local memory. When HBase cannot connect to a server,
it
> > puts it
> > > > > > into the "failedServerList" for 2 seconds. This is to avoid
> having
> > all
> > > > > the
> > > > > > threads going into a potentially long socket timeout. Are you
> sure
> > that
> > > > > you
> > > > > > can connect from the master to this machine/port?
> > > > > >
> > > > > > You can change the time it stays in the list with
> > > > > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but
it
> > should
> > > > > not
> > > > > > help.
> > > > > >
> > > > > > You should have another exception before this one in the logs
> (the
> > one
> > > > > that
> > > > > > initially put this region server in this failedServerList).
> > > > > >
> > > > > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <
> > sandeepvreddy@outlook.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > While trying to run hbase balancer I am getting error message
> as
> > > > "This
> > > > > > > server is in the failed servers list".Due to this cluster
is
> not
> > > > > getting
> > > > > > > balanced.
> > > > > > > Even though regionserver is up and running hmaster is unable
to
> > > > > connect to
> > > > > > > it.
> > > > > > > The odd thing here is hmaster is able to start regionserver
and
> > it is
> > > > > > > detected as up and running but unable to assign regions.
> > > > > > > Can some one suggest any solution for this.
> > > > > > > Following is full stack
> > > > > > >
> > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> > > > This
> > > > > > > server is in the failed servers list: host1/192.168.2.20:60020
> > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > > > > > > at
> > > > > > >
> > > > >
> > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> > > > > > >  at
> > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> > > > >   at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> > > > > > >       at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> > > > > > >      at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> > > > > > >    at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> > > > > > >   at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > > > > > > at
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > >  at
> > > > > > >
> > > > >
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > > >     at
> > > > > > >
> > > > >
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > > >     at java.lang.Thread.run(Thread.java:745)
> > > > > > > Thanks,Sandeep.
> > > > >
> > > > >
> > > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message