incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart
Date Sun, 21 Aug 2011 11:35:59 GMT
There is some confusion in the ring about nodes leaving. Check nodetool ring from every node
and see if they agree. Check the logs to see if there is any information about node is sending
the wrong message. 

Without knowing much more you could  try a rolling restart, but you may need a full restart
see http://www.datastax.com/docs/0.7/troubleshooting/index#view-of-ring-differs-between-some-nodes
if the ring state is different. 

Hope that helps. 
 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21/08/2011, at 5:38 AM, Anand Somani wrote:

> 0.7.4/ 3 node cluster/ RF -3 /Quorum read/write
> 
> After I re-introduced a corrupted node, followed the process as (thanks to folks on the
mailing list for helping me) listed on the operations wiki to handle failures.
> Still doing a cleanup on one node at this point. But I noticed that I am seeing this
same exception appear 10/12 times in a minute, on an existing node (not the new one). I think
it started around the removetoken.
> 
> How do I solve this, should I just restart this node? Any other cleanups/resets I need
to do?
> 
> Thanks
> 
> 
> On Thu, Apr 28, 2011 at 2:26 AM, aaron morton <aaron@thelastpickle.com> wrote:
> I *think* that code is used when one node tells others via gossip it is removing a token
that is not it's own. The ode that receives information in gossip does some work and then
replies to the first node with a REPLICATION_FINISHED message, which is the node I assume
the error is happening on.
> 
> Have you been doing any moves / removes or additions or tokens/nodes?
> 
> Thanks
> Aaron
> 
> On 28 Apr 2011, at 08:39, Alexis Lê-Quôc wrote:
> 
> > Hi,
> >
> > I've been getting the following lately, every few seconds.
> >
> > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error
> > in ThreadPoolExecutor
> > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError
> > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> > org.apache.cassandra.service.StorageService.confirmReplication(StorageService.java:1872)
> > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> > org.apache.cassandra.streaming.ReplicationFinishedVerbHandler.doVerb(ReplicationFinishedVerbHandler.java:38)
> > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> > java.lang.Thread.run(Thread.java:636)
> > 2011-04-27T20:21:18.300555+00:00 10.202.61.193 [MiscStage: 97] Fatal
> > exception in thread Thread[MiscStage:97,5,main]
> >
> > I see it coming from
> > 32 public class ReplicationFinishedVerbHandler implements IVerbHandler
> > 33 {
> > 34     private static Logger logger =
> > LoggerFactory.getLogger(ReplicationFinishedVerbHandler.class);
> > 35
> > 36     public void doVerb(Message msg, String id)
> > 37     {
> > 38         StorageService.instance.confirmReplication(msg.getFrom());
> > 39         Message response =
> > msg.getInternalReply(ArrayUtils.EMPTY_BYTE_ARRAY);
> > 40         if (logger.isDebugEnabled())
> > 41             logger.debug("Replying to " + id + "@" + msg.getFrom());
> > 42         MessagingService.instance().sendReply(response, id, msg.getFrom());
> > 43     }
> > 44 }
> >
> > Before I dig deeper in the code, has anybody dealt with this before?
> >
> > Thanks,
> >
> > --
> > Alexis Lê-Quôc
> 
> 


Mime
View raw message