Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A32FD9517 for ; Thu, 19 Apr 2012 11:06:40 +0000 (UTC) Received: (qmail 16329 invoked by uid 500); 19 Apr 2012 11:06:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16297 invoked by uid 500); 19 Apr 2012 11:06:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16287 invoked by uid 99); 19 Apr 2012 11:06:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Apr 2012 11:06:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_FRT_FOLLOW1 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a46.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Apr 2012 11:06:30 +0000 Received: from homiemail-a46.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a46.g.dreamhost.com (Postfix) with ESMTP id 4ABAD3E4058 for ; Thu, 19 Apr 2012 04:06:07 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=vtq7MChTRo pGwTUqmLuEdIHP5LX34Pv+UGisPwamXX7dF8Wv+yAX1r/a3bAvKzjyjAtP4XjPGH CPyDZKilJ0sJQoSjGsRMbr0/DMmXugw0cdhrYBltbSSiYWckV6btBeFZZKU/6dMJ efLi3TUwOid2uNdiYbEEmRnc8bynC1qbw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=GQ2fLRg3h9moyD78 2KjUrGZaEnI=; b=USmZ+eFQN4egb/7+qAuX42GDdxcRO5B5rQj5NsHxh4b0y1F/ glnE5oxee7W9nJMRO5WCpl78MUGuHI7hKulxuXKf7TmeWauMxDbRzsNf5m+VEVji 2KJF8qjf66yLfKPTFIOWRCCWrdWTk07+9tt2xbH2ZewSP+N5peJ0zXuLMPc= Received: from [172.16.1.3] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a46.g.dreamhost.com (Postfix) with ESMTPSA id BED863E4057 for ; Thu, 19 Apr 2012 04:06:05 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: multipart/alternative; boundary="Apple-Mail=_38CE64BF-925F-4990-9FF9-B5F49DDC6307" Subject: Re: RMI/JMX errors, weird Date: Thu, 19 Apr 2012 23:06:02 +1200 In-Reply-To: <4F8F77B3.70907@bnl.gov> To: user@cassandra.apache.org References: <4F876809.6000006@rightscale.com> <4F8769A4.3050702@bnl.gov> <950D362D-ABA2-42CF-B701-7D4AF2FC9112@thelastpickle.com> <4F8F77B3.70907@bnl.gov> Message-Id: <5B4DAE9F-8536-4B48-8CEC-EC59E9C4C796@thelastpickle.com> X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_38CE64BF-925F-4990-9FF9-B5F49DDC6307 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 At some point the gossip system on the node this log is from decided = that 130.199.185.195 was DOWN. This was based on how often the node was = gossiping to the cluster.=20 The active repair session was informed. And to avoid failing the job = unnecessarily it tested that the errant nodes phi value was twice the = configured phi_convict_threshold. It was and the repair was killed.=20 Take a look at the logs on 130.199.185.195 and see if anything was = happening on the node at the same time. Could be GC or an overloaded = node (it would log about dropped messages).=20 Perhaps other nodes also saw 130.199.185.195 as down? it only needed to = be down for a few seconds.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/04/2012, at 2:25 PM, Maxim Potekhin wrote: > Server log below. Mind you that all the nodes are still up -- even = though reported as "dead" in this log. > What's going on here? >=20 > Thanks! >=20 > INFO [GossipTasks:1] 2012-04-18 22:18:26,487 Gossiper.java (line 719) = InetAddress /130.199.185.193 is now dead. > INFO [ScheduledTasks:1] 2012-04-18 22:18:26,487 StatusLogger.java = (line 50) Pool Name Active Pending Blocked > ERROR [GossipTasks:1] 2012-04-18 22:18:26,488 AntiEntropyService.java = (line 722) Problem during repair session manual-repair-1b3453b > 6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died > ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 22:18:26,488 = StorageService.java (line 1607) Repair session org.apache.cas > sandra.service.AntiEntropyService$RepairSession@4cc9e2bc failed. > java.util.concurrent.ExecutionException: java.lang.RuntimeException: = java.io.IOException: Problem during repair session manual-repai > r-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.185.195 died > at = java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at = org.apache.cassandra.service.StorageService.forceTableRepair(StorageServic= e.java:1603) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:93) > at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:27) > at = com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:2= 08) > at = com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) > at = com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) > at = com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanS= erverInterceptor.java:836) > at = com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) > at = javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImp= l.java:1427) > at = javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl= .java:72) > at = javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIC= onnectionImpl.java:1265) > at = javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMICon= nectionImpl.java:1360) > at = javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.jav= a:788) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown = Source) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at = sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) > at sun.rmi.transport.Transport$1.run(Transport.java:159) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:155) > at = sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) > at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.jav= a:790) > at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java= :649) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: java.io.IOException: Problem = during repair session manual-repair-43545b22-ffe8-4243-8a98-509b > bfec9872, endpoint /130.199.185.195 died > at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > Caused by: java.io.IOException: Problem during repair session = manual-repair-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199. > 185.195 died > at = org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(A= ntiEntropyService.java:723) > at = org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(Anti= EntropyService.java:760) > at = org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:16= 5) > at = org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538) > at = org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57) > at = org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157) > at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at = java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)= > at = java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) > at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces= s$101(ScheduledThreadPoolExecutor.java:98) > at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe= riodic(ScheduledThreadPoolExecutor.java:180) > at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S= cheduledThreadPoolExecutor.java:204) > ... 3 more > INFO [FlushWriter:4] 2012-04-18 22:18:26,489 Memtable.java (line 237) = Writing Memtable-files.pandaid@988533693(1754325/140346000 se > rialized/live bytes, 70173 ops) > ERROR [AntiEntropySessions:8] 2012-04-18 22:18:26,488 = AntiEntropyService.java (line 691) Repair session = manual-repair-1b3453b6-28b5- > 4abd-84ce-0326b5468064 (on cfs [Ljava.lang.String;@5a44cc06, range = (113427455640312821154458202477256070484,0]) failed with the foll > owing error > java.io.IOException: Problem during repair session = manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint = /130.199.185.193 die > d > at = org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(A= ntiEntropyService.java:723) > at = org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(Anti= EntropyService.java:760) > at = org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:16= 5) > at = org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538) > at = org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57) > at = org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157) > at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at = java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)= > at = java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) > at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces= s$101(ScheduledThreadPoolExecutor.java:98) > at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe= riodic(ScheduledThreadPoolExecutor.java:180) > at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S= cheduledThreadPoolExecutor.java:204) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) > INFO [ScheduledTasks:1] 2012-04-18 22:18:26,488 StatusLogger.java = (line 65) ReadStage 0 0 0 > ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 22:18:26,491 = StorageService.java (line 1607) Repair session org.apache.cas > sandra.service.AntiEntropyService$RepairSession@41bb342c failed. > java.util.concurrent.ExecutionException: java.lang.RuntimeException: = java.io.IOException: Problem during repair session manual-repai > r-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died > at = java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at = org.apache.cassandra.service.StorageService.forceTableRepair(StorageServic= e.java:1603) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:93) > at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:27) > at = com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:2= 08) > at = com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) > at = com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) > at = com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanS= erverInterceptor.java:836) > at = com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) > at = javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImp= l.java:1427) > at = javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl= .java:72) > at = javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIC= onnectionImpl.java:1265) > at = javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMICon= nectionImpl.java:1360) > at = javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.jav= a:788) > at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown = Source) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at = sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) > at sun.rmi.transport.Transport$1.run(Transport.java:159) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:155) > at = sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) > at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.jav= a:790) > at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java= :649) > at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) > at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: java.io.IOException: Problem = during repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326 > b5468064, endpoint /130.199.185.193 died > at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > Caused by: java.io.IOException: Problem during repair session = manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199. > 185.193 died > at = org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(A= ntiEntropyService.java:723) > at = org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(Anti= EntropyService.java:760) > at = org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:16= 5) > at = org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538) > at = org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57) > at = org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157) > at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at = java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)= >=20 >=20 >=20 > On 4/12/2012 10:03 PM, aaron morton wrote: >>=20 >> Look at the server side logs for errors.=20 >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 13/04/2012, at 11:47 AM, Maxim Potekhin wrote: >>=20 >>> Hello, >>>=20 >>> I'm doing compactions under 0.8.8. >>>=20 >>> Recently, I started seeing a stack trace like one below, and I can't = figure out what causes this to appear. >>> The cluster has been in operation for mode than half a year w/o = errors like this one. >>>=20 >>> Any help will be appreciated, >>> Thanks >>>=20 >>> Maxim >>>=20 >>>=20 >>> WARNING: Failed to check the connection: = java.net.SocketTimeoutException: Read timed out >>> Exception in thread "main" java.io.IOException: Repair command #1: = some repair session(s) failed (see log for details). >>> at = org.apache.cassandra.service.StorageService.forceTableRepair(StorageServic= e.java:1613) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method) >>> at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39) >>> at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:93) >>> at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:27) >>> at = com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:2= 08) >>> at = com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) >>> at = com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) >>> at = com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanS= erverInterceptor.java:836) >>> at = com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) >>> at = javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImp= l.java:1427) >>> at = javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl= .java:72) >>> at = javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIC= onnectionImpl.java:1265) >>> at = javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMICon= nectionImpl.java:1360) >>> at = javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.jav= a:788) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method) >>> at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39) >>> at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at = sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) >>> at sun.rmi.transport.Transport$1.run(Transport.java:159) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at = sun.rmi.transport.Transport.serviceCall(Transport.java:155) >>> at = sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) >>> at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.jav= a:790) >>> at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java= :649) >>> at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) >>> at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) >>> at java.lang.Thread.run(Thread.java:662) >>>=20 >>=20 >=20 --Apple-Mail=_38CE64BF-925F-4990-9FF9-B5F49DDC6307 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 At = some point the gossip system on the node this log is from decided = that 130.199.185.195 was DOWN. This was based on how often the node = was gossiping to the cluster. 

The active repair = session was informed. And to avoid failing the job unnecessarily it = tested that the errant nodes phi value was twice the configured = phi_convict_threshold. It was and the repair was = killed. 

Take a look at the logs = on 130.199.185.195 and see if anything was happening on the node at = the same time. Could  be GC or an overloaded node (it would log = about dropped messages). 

Perhaps other = nodes also saw 130.199.185.195 as down? it only needed to be down = for a few seconds. 

Hope that = helps. 

http://www.thelastpickle.com

On 19/04/2012, at 2:25 PM, Maxim Potekhin wrote:

=20 =20
Server log below. Mind you that all the nodes are still up -- even though reported as "dead" in this log.
What's going on here?

Thanks!

 INFO [GossipTasks:1] 2012-04-18 22:18:26,487 Gossiper.java = (line 719) InetAddress /130.199.185.193 is now dead.
 INFO [ScheduledTasks:1] 2012-04-18 22:18:26,487 = StatusLogger.java (line 50) Pool = Name           &nbs= p;        Active   = Pending   Blocked
ERROR [GossipTasks:1] 2012-04-18 22:18:26,488 AntiEntropyService.java (line 722) Problem during repair session manual-repair-1b3453b
6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died
ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 22:18:26,488 StorageService.java (line 1607) Repair session org.apache.cas
sandra.service.AntiEntropyService$RepairSession@4cc9e2bc failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Problem during repair session manual-repai
r-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.185.195 died
        at = java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at = java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at = org.apache.cassandra.service.StorageService.forceTableRepair(StorageServic= e.java:1603)
        at = sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39)
        at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
        at = java.lang.reflect.Method.invoke(Method.java:597)
        at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:93)
        at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:27)
        at = com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:2= 08)
        at = com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
        at = com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
        at = com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanS= erverInterceptor.java:836)
        at = com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
=         at = javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImp= l.java:1427)
        at = javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl= .java:72)
        at = javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIC= onnectionImpl.java:1265)
        at = javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMICon= nectionImpl.java:1360)
        at = javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.jav= a:788)
        at = sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
        at = java.lang.reflect.Method.invoke(Method.java:597)
        at = sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
        at = sun.rmi.transport.Transport$1.run(Transport.java:159)
        at = java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
        at = sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)         at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.jav= a:790)
        at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java= :649)
        at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
        at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
        at = java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: java.io.IOException: Problem during repair session manual-repair-43545b22-ffe8-4243-8a98-509b
bfec9872, endpoint /130.199.185.195 died
        at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)         at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)         at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at = java.util.concurrent.FutureTask.run(FutureTask.java:138)
        ... 3 more
Caused by: java.io.IOException: Problem during repair session manual-repair-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.
185.195 died
        at = org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(A= ntiEntropyService.java:723)
        at = org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(Anti= EntropyService.java:760)
        at = org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:16= 5)
        at = org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
        at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
        at = org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
        at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)         at = java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)=
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces= s$101(ScheduledThreadPoolExecutor.java:98)
        at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe= riodic(ScheduledThreadPoolExecutor.java:180)
        at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S= cheduledThreadPoolExecutor.java:204)
        ... 3 more
 INFO [FlushWriter:4] 2012-04-18 22:18:26,489 Memtable.java = (line 237) Writing Memtable-files.pandaid@988533693(1754325/140346000 = se
rialized/live bytes, 70173 ops)
ERROR [AntiEntropySessions:8] 2012-04-18 22:18:26,488 AntiEntropyService.java (line 691) Repair session manual-repair-1b3453b6-28b5-
4abd-84ce-0326b5468064 (on cfs [Ljava.lang.String;@5a44cc06, range (113427455640312821154458202477256070484,0]) failed with the = foll
owing error
java.io.IOException: Problem during repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 die
d
        at = org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(A= ntiEntropyService.java:723)
        at = org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(Anti= EntropyService.java:760)
        at = org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:16= 5)
        at = org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
        at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
        at = org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
        at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)         at = java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)=
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces= s$101(ScheduledThreadPoolExecutor.java:98)
        at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe= riodic(ScheduledThreadPoolExecutor.java:180)
        at = java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(S= cheduledThreadPoolExecutor.java:204)
        at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
        at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
        at = java.lang.Thread.run(Thread.java:662)
 INFO [ScheduledTasks:1] 2012-04-18 22:18:26,488 = StatusLogger.java (line 65) = ReadStage           = ;            &= nbsp; 0         = 0         0
ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 22:18:26,491 StorageService.java (line 1607) Repair session org.apache.cas
sandra.service.AntiEntropyService$RepairSession@41bb342c failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Problem during repair session manual-repai
r-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died
        at = java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
        at = java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at = org.apache.cassandra.service.StorageService.forceTableRepair(StorageServic= e.java:1603)
        at = sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39)
        at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
        at = java.lang.reflect.Method.invoke(Method.java:597)
        at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:93)
        at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:27)
        at = com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:2= 08)
        at = com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
        at = com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
        at = com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanS= erverInterceptor.java:836)
        at = com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
=         at = javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImp= l.java:1427)
        at = javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl= .java:72)
        at = javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIC= onnectionImpl.java:1265)
        at = javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMICon= nectionImpl.java:1360)
        at = javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.jav= a:788)
        at = sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
        at = java.lang.reflect.Method.invoke(Method.java:597)
        at = sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
        at = sun.rmi.transport.Transport$1.run(Transport.java:159)
        at = java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
        at = sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)         at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.jav= a:790)
        at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java= :649)
        at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
        at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
        at = java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: java.io.IOException: Problem during repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326
b5468064, endpoint /130.199.185.193 died
        at = org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)         at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)         at = java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at = java.util.concurrent.FutureTask.run(FutureTask.java:138)
        ... 3 more
Caused by: java.io.IOException: Problem during repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.
185.193 died
        at = org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(A= ntiEntropyService.java:723)
        at = org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(Anti= EntropyService.java:760)
        at = org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:16= 5)
        at = org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
        at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
        at = org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
        at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)         at = java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)=



On 4/12/2012 10:03 PM, aaron morton wrote:
Look at the server side logs for errors. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 13/04/2012, at 11:47 AM, Maxim Potekhin wrote:

Hello,

I'm doing compactions under 0.8.8.

Recently, I started seeing a stack trace like one below, and I can't figure out what causes this to appear.
The cluster has been in operation for mode than half a year w/o errors like this one.

Any help will be appreciated,
Thanks

Maxim


WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out
Exception in thread "main" java.io.IOException: Repair command #1: some repair session(s) failed (see log for details).
       at = org.apache.cassandra.service.StorageService.forceTableRepair(StorageServic= e.java:1613)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39)
       at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
       at = java.lang.reflect.Method.invoke(Method.java:597)
       at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:93)
       at = com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIn= trospector.java:27)
       at = com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:2= 08)
       at = com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
       at = com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
       at = com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanS= erverInterceptor.java:836)
       at = com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
=        at = javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImp= l.java:1427)
       at = javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl= .java:72)
       at = javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIC= onnectionImpl.java:1265)
       at = javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMICon= nectionImpl.java:1360)
       at = javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.jav= a:788)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 39)
       at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25)
       at = java.lang.reflect.Method.invoke(Method.java:597)
       at = sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
       at sun.rmi.transport.Transport$1.run(Transport.java:159)
       at java.security.AccessController.doPrivileged(Native = Method)
       at = sun.rmi.transport.Transport.serviceCall(Transport.java:155)
       at = sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)        at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.jav= a:790)
       at = sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java= :649)
       at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886)
       at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908)
       at = java.lang.Thread.run(Thread.java:662)




= --Apple-Mail=_38CE64BF-925F-4990-9FF9-B5F49DDC6307--