hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Replication hosed after simple cluster restart
Date Thu, 14 Mar 2013 01:22:11 GMT
I suppose the problem could be in zkHelper.copyQueuesFromRSUsingMulti(rsZnode) as called from
copyQueuesFromRSUsingMulti will return the queues it read even when the multi operation failed
(because another RS managed to execute it first).

-- Lars

 From: lars hofhansl <larsh@apache.org>
To: hbase-dev <dev@hbase.apache.org> 
Sent: Wednesday, March 13, 2013 6:12 PM
Subject: Replication hosed after simple cluster restart
We just ran into an interesting scenario. We restarted a cluster that was setup as a replication
The stop went cleanly.

Upon restart *all* regionservers aborted within a few seconds with variations of these errors:

This is scary!

-- Lars
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message