hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: hbase replication to slave retry
Date Mon, 14 Dec 2015 05:51:24 GMT
Replication from master cluster will do retry of the failed one

-Anoop-

On Sat, Dec 12, 2015 at 6:05 AM, Abraham Tom <work2much@gmail.com> wrote:

> I have 2 clusters ( 1 master and 1 slave) on CDH 5.4 hbase 1.0
> replication is working 95% of the time
> but I do get the following WARN which I consider an error
>
>
> Can't replicate because of an error on the remote cluster:
>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException):
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 11 actions: NotServingRegionException: 11 times,
>         at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1003)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1017)
>         at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:236)
>         at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:160)
>         at
> org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:198)
>         at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1584)
>         at
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20880)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>         at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> I consider this an error because my slave is missing data that I have in
> the master.   Is there a setting in hbase to keep trying to send ?
> Cloudera management does try to restart and alerts me if the region for
> some reason dies.  As to why it dies, I am looking and that is a different
> problem.   but when the slave returns, I have an expectation that the
> unconfirmed records would be resent.
>
> Best practices would be helpful as well
> All zookeepers in the slave are listed as peers
>
>
> --
> Abraham Tom
> Email:   work2much@gmail.com
> Phone:  415-515-3621
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message