hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7293) [replication] Remove dead sinks from ReplicationSource.currentPeers, it's spammy
Date Fri, 28 Dec 2012 23:06:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540631#comment-13540631

Lars Hofhansl commented on HBASE-7293:

Can anybody else have a quick look. Otherwise I'll move it to 0.94.5.
> [replication] Remove dead sinks from ReplicationSource.currentPeers, it's spammy
> --------------------------------------------------------------------------------
>                 Key: HBASE-7293
>                 URL: https://issues.apache.org/jira/browse/HBASE-7293
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.3, 0.96.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.4
>         Attachments: 7293-0.94.txt, 7293-0.94-v2.txt, 7293-0.96.txt
> I happened to look at a log today where I saw a lot lines like this:
> {noformat}
> 2012-12-06 23:29:08,318 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Slave cluster looks down: This server is in the failed servers list: sv4r20s49/
> 2012-12-06 23:29:15,987 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Can't replicate because of a local or network error: 
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:519)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:484)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:416)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:462)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1150)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1000)
> 	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> 	at $Proxy14.replicateLogEntries(Unknown Source)
> 	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:627)
> 	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
> 2012-12-06 23:29:15,988 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Slave cluster looks down: Connection refused
> {noformat}
> What struck me as weird is this had been going on for some days, I would expect the RS
to find new servers if it wasn't able to replicate. But the reality is that only a few of
the chosen sink RS were down so eventually the source hits one that's good and is never able
to refresh its list of servers.
> We should remove the dead servers, it's spammy and probably adds some slave lag.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message