cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew F. Dennis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1160) race with insufficiently constructed Gossiper
Date Mon, 07 Jun 2010 23:35:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876489#action_12876489
] 

Matthew F. Dennis commented on CASSANDRA-1160:
----------------------------------------------

I should have been more clear about that.  The failures in StorageServiceClientTest are because
files from DatabaseDescriptor.getAllDataFileLocations() exist (the test asserts they do not).
 This is because when I moved start into <init> I passed the same generation number
that that initServer() uses, but initClient (used by the test) passes a generation number
based on currentTimeMillis() which doesn't cause the file paths to exist.  It wasn't clear
to me there was an easy way to determine what generation to use.

More importantly, this lead me to notice that both StorageService and StorageLoadBalancer
are registered with Gossiper before start is called.  If start was moved into <init>
I was afraid of introducing new race conditions (e.g. Gossiper starting up, doing things that
would have resulted in publications and then later receiving the registration for such publications).

I agree that Gossiper is likely not being spun up correctly.  I think this applies to other
things as well (in general things seem to have many interdependencies on boot.  I don't have
anything specific to point at, but it feels like there are several race conditions that just
happen to usually not be exhibited).

One solution I've used in the past that I like for this in general is that none of the singletons
are inited in their own classes, but only by a BootInitializer of sorts that completely controls
when the objects are inited/started/created/etc, in particular controlling the order things
are done.  A single entry point into the system if you will.  This seems like some work from
the point we're at now and probably a bit unnecessary.

Along those lines I had a similar patch that I didn't submit that took a third argument, List<IEndpointStateChangeSubsciber>
initialSubscribers.  Each initialSubscriber was registered before the existing code in start
was called.  initServer built a list of the StorageService and StorageLoadBalance instances
and passed that to start to ensure none of the publications were missed by either service.
 Like the 0001-cassandra-0.6-1160.patch references to .instance were replaced with getInstance()
calls which busy waited (via yield) until start completed but this meant that StorageService
was managing StorageLoadBalancer registrations and that didn't feel right, hence preRegistrations()
(so each could manager their own registrations).

So, basically two issues/questions if Gossiper.start moves to Gossiper.<init>:

1) how does Gossiper.<init> get the correct generation number (server v client)?

2) what about missed publications because Gossiper was started before StorageService and/or
StorageLoadBalancer got a chance to register?


> race with insufficiently constructed Gossiper
> ---------------------------------------------
>
>                 Key: CASSANDRA-1160
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1160
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>            Priority: Minor
>             Fix For: 0.6.3
>
>         Attachments: 0001-cassandra-0.6-1160.patch
>
>
> Gossiper.start needs to be integrated into the constructor.  Currently you can have threads
using the gossiper instance before start finishes (or even starts?), resulting in tracebacks
like this:
> ERROR [GMFD:1] 2010-06-02 10:45:49,878 CassandraDaemon.java (line 78) Fatal exception
in thread Thread[GMFD:1,5,main]
> java.lang.AssertionError
> 	at org.apache.cassandra.net.Header.<init>(Header.java:56)
> 	at org.apache.cassandra.net.Header.<init>(Header.java:74)
> 	at org.apache.cassandra.net.Message.<init>(Message.java:58)
> 	at org.apache.cassandra.gms.Gossiper.makeGossipDigestAckMessage(Gossiper.java:294)
> 	at org.apache.cassandra.gms.Gossiper$GossipDigestSynVerbHandler.doVerb(Gossiper.java:935)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:619)
> ERROR [GMFD:2] 2010-06-02 10:45:49,880 CassandraDaemon.java (line 78) Fatal exception
in thread Thread[GMFD:2,5,main]
> java.lang.AssertionError
> 	at org.apache.cassandra.net.Header.<init>(Header.java:56)
> 	at org.apache.cassandra.net.Header.<init>(Header.java:74)
> 	at org.apache.cassandra.net.Message.<init>(Message.java:58)
> 	at org.apache.cassandra.gms.Gossiper.makeGossipDigestAckMessage(Gossiper.java:294)
> 	at org.apache.cassandra.gms.Gossiper$GossipDigestSynVerbHandler.doVerb(Gossiper.java:935)
> 	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message