cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors
Date Mon, 22 Feb 2010 16:36:28 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-800.
--------------------------------------

    Resolution: Fixed
      Assignee:     (was: Jaakko Laine)

i'm going to close this as a dupe of CASSANDRA-757 even though they are different errors,
since the right fix for 757 will be using a concurrent structure, which will fix any other
CMEs too.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>             Fix For: 0.5
>
>         Attachments: 800.txt
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484)
Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000
remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484)
Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000
remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485)
Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining:
16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80
BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal
error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82
is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242
is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106
is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26
- Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204
is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106
is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197
is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216
is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217
is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223
is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215
is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to
be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message