cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Fong (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13569) Schedule schema pulls just once per endpoint
Date Fri, 23 Jun 2017 15:15:00 GMT


Michael Fong commented on CASSANDRA-13569:

Hi, []

I agree w/ you that even ScheduledExecutor on MigrationTask would fail on rare cases. 

In CASSANDRA-11748, we had patched our own v2.0 source code with similar idea that limits
schema pull only once per endpoint. However, we later on have observed a corner case that
when two nodes with different schema version boot up at the same time, one node running slightly
- a few seconds - faster than the other. The first node requests schema pull and failed since
the other node has not yet finished initialization. 

There has been a huge difference in v2.0 and 3.x code bases, and I do not know if the corner
problem still persists. Here is the the problematic code snippet for your reference. 
if (epState == null)  {
{code} would probably not prevent this. In your patch, if the state of ScheduledFuture return
done, things could get much messier since schema migration would never happen. 


Michael Fong

> Schedule schema pulls just once per endpoint
> --------------------------------------------
>                 Key: CASSANDRA-13569
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Distributed Metadata
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>             Fix For: 3.0.x, 3.11.x, 4.x
> Schema mismatches detected through gossip will get resolved by calling {{MigrationManager.maybeScheduleSchemaPull}}.
This method may decide to schedule execution of {{MigrationTask}}, but only after using a
{{MIGRATION_DELAY_IN_MS = 60000}} delay (for reasons unclear to me). Meanwhile, as long as
the migration task hasn't been executed, we'll continue to have schema mismatches reported
by gossip and will have corresponding {{maybeScheduleSchemaPull}} calls, which will schedule
further tasks with the mentioned delay. Some local testing shows that dozens of tasks for
the same endpoint will eventually be executed and causing the same, stormy behavior for this
very endpoints.
> My proposal would be to simply not schedule new tasks for the same endpoint, in case
we still have pending tasks waiting for execution after {{MIGRATION_DELAY_IN_MS}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message