incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Potekhin <potek...@bnl.gov>
Subject Re: RMI/JMX errors, weird
Date Thu, 19 Apr 2012 12:29:18 GMT
Hello Aaron,

how should I go about fixing that? Also, after a repeated attempt to compact
it goes again into "building secondary index" with 1150 pending tasks, 
and is not
making progress. I suspected the disk system failure, but this needs to 
be confirmed.

So basically, do I need to tune the phi threshold up? Thing is, there 
was no heavy load
on the cluster at all.

Thanks

Maxim




On 4/19/2012 7:06 AM, aaron morton wrote:
> At some point the gossip system on the node this log is from decided 
> that 130.199.185.195 was DOWN. This was based on how often the node 
> was gossiping to the cluster.
>
> The active repair session was informed. And to avoid failing the job 
> unnecessarily it tested that the errant nodes phi value was twice the 
> configured phi_convict_threshold. It was and the repair was killed.
>
> Take a look at the logs on 130.199.185.195 and see if anything was 
> happening on the node at the same time. Could  be GC or an overloaded 
> node (it would log about dropped messages).
>
> Perhaps other nodes also saw 130.199.185.195 as down? it only needed 
> to be down for a few seconds.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/04/2012, at 2:25 PM, Maxim Potekhin wrote:
>
>> Server log below. Mind you that all the nodes are still up -- even 
>> though reported as "dead" in this log.
>> What's going on here?
>>
>> Thanks!
>>
>>  INFO [GossipTasks:1] 2012-04-18 22:18:26,487 Gossiper.java (line 
>> 719) InetAddress /130.199.185.193 is now dead.
>>  INFO [ScheduledTasks:1] 2012-04-18 22:18:26,487 StatusLogger.java 
>> (line 50) Pool Name                    Active   Pending   Blocked
>> ERROR [GossipTasks:1] 2012-04-18 22:18:26,488 AntiEntropyService.java 
>> (line 722) Problem during repair session manual-repair-1b3453b
>> 6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died
>> ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 
>> 22:18:26,488 StorageService.java (line 1607) Repair session 
>> org.apache.cas
>> sandra.service.AntiEntropyService$RepairSession@4cc9e2bc failed.
>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
>> java.io.IOException: Problem during repair session manual-repai
>> r-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.185.195 died
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at 
>> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1603)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at 
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>         at 
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>         at 
>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>         at 
>> com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>         at 
>> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>         at 
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>         at 
>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>>         at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at 
>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
>>         at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>         at 
>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>         at 
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>         at 
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.RuntimeException: java.io.IOException: Problem 
>> during repair session manual-repair-43545b22-ffe8-4243-8a98-509b
>> bfec9872, endpoint /130.199.185.195 died
>>         at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>         at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>         ... 3 more
>> Caused by: java.io.IOException: Problem during repair session 
>> manual-repair-43545b22-ffe8-4243-8a98-509bbfec9872, endpoint /130.199.
>> 185.195 died
>>         at 
>> org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
>>         at 
>> org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
>>         at 
>> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
>>         at 
>> org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
>>         at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
>>         at 
>> org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
>>         at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at 
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         ... 3 more
>>  INFO [FlushWriter:4] 2012-04-18 22:18:26,489 Memtable.java (line 
>> 237) Writing Memtable-files.pandaid@988533693(1754325/140346000 se
>> rialized/live bytes, 70173 ops)
>> ERROR [AntiEntropySessions:8] 2012-04-18 22:18:26,488 
>> AntiEntropyService.java (line 691) Repair session 
>> manual-repair-1b3453b6-28b5-
>> 4abd-84ce-0326b5468064 (on cfs [Ljava.lang.String;@5a44cc06, range 
>> (113427455640312821154458202477256070484,0]) failed with the foll
>> owing error
>> java.io.IOException: Problem during repair session 
>> manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint 
>> /130.199.185.193 die
>> d
>>         at 
>> org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
>>         at 
>> org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
>>         at 
>> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
>>         at 
>> org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
>>         at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
>>         at 
>> org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
>>         at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at 
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>>  INFO [ScheduledTasks:1] 2012-04-18 22:18:26,488 StatusLogger.java 
>> (line 65) ReadStage                         0         0         0
>> ERROR [RMI TCP Connection(22)-130.199.185.194] 2012-04-18 
>> 22:18:26,491 StorageService.java (line 1607) Repair session 
>> org.apache.cas
>> sandra.service.AntiEntropyService$RepairSession@41bb342c failed.
>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
>> java.io.IOException: Problem during repair session manual-repai
>> r-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.185.193 died
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at 
>> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1603)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at 
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>         at 
>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>         at 
>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>         at 
>> com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>         at 
>> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>         at 
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>         at 
>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>         at 
>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>>         at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at 
>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
>>         at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>         at 
>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>         at 
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>         at 
>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.lang.RuntimeException: java.io.IOException: Problem 
>> during repair session manual-repair-1b3453b6-28b5-4abd-84ce-0326
>> b5468064, endpoint /130.199.185.193 died
>>         at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>         at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>         ... 3 more
>> Caused by: java.io.IOException: Problem during repair session 
>> manual-repair-1b3453b6-28b5-4abd-84ce-0326b5468064, endpoint /130.199.
>> 185.193 died
>>         at 
>> org.apache.cassandra.service.AntiEntropyService$RepairSession.failedNode(AntiEntropyService.java:723)
>>         at 
>> org.apache.cassandra.service.AntiEntropyService$RepairSession.convict(AntiEntropyService.java:760)
>>         at 
>> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:165)
>>         at 
>> org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:538)
>>         at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:57)
>>         at 
>> org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:157)
>>         at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>
>>
>>
>> On 4/12/2012 10:03 PM, aaron morton wrote:
>>> Look at the server side logs for errors.
>>>
>>> Cheers
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com <http://www.thelastpickle.com/>
>>>
>>> On 13/04/2012, at 11:47 AM, Maxim Potekhin wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm doing compactions under 0.8.8.
>>>>
>>>> Recently, I started seeing a stack trace like one below, and I 
>>>> can't figure out what causes this to appear.
>>>> The cluster has been in operation for mode than half a year w/o 
>>>> errors like this one.
>>>>
>>>> Any help will be appreciated,
>>>> Thanks
>>>>
>>>> Maxim
>>>>
>>>>
>>>> WARNING: Failed to check the connection: 
>>>> java.net.SocketTimeoutException: Read timed out
>>>> Exception in thread "main" java.io.IOException: Repair command #1: 
>>>> some repair session(s) failed (see log for details).
>>>>        at 
>>>> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)
>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>        at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>        at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>        at 
>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
>>>>        at 
>>>> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
>>>>        at 
>>>> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>>>>        at 
>>>> com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
>>>>        at 
>>>> com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
>>>>        at 
>>>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
>>>>        at 
>>>> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
>>>>        at 
>>>> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
>>>>        at 
>>>> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
>>>>        at 
>>>> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
>>>>        at 
>>>> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
>>>>        at 
>>>> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>        at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>        at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>>        at 
>>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
>>>>        at sun.rmi.transport.Transport$1.run(Transport.java:159)
>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
>>>>        at 
>>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
>>>>        at 
>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
>>>>        at 
>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
>>>>        at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>>        at 
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>>        at java.lang.Thread.run(Thread.java:662)
>>>>
>>>
>>
>


Mime
View raw message