cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6648) Race condition during node bootstrapping
Date Wed, 05 Feb 2014 07:54:09 GMT


Sylvain Lebresne commented on CASSANDRA-6648:

If this is a regression of CASSANDRA-6615, it doesn't affect 2.0.4, does it? (asking because
of the current 'reproduced in').

[~enigmacurry] would be great if you could push that test of yours above as a dtest.

> Race condition during node bootstrapping
> ----------------------------------------
>                 Key: CASSANDRA-6648
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Critical
>             Fix For: 1.2.15, 2.0.6
>         Attachments: 6648-v2.txt, 6648-v3-1.2.txt, 6648-v3.txt, CASSANDRA-6648.patch
> When bootstrapping a new node, data is "missing" as if the new node didn't actually bootstrap,
which I tracked down to the following scenario:
> 1) New node joins token ring and waits for schema to be settled before actually bootstrapping.
> 2) The schema scheck somewhat passes and it starts bootstrapping.
> 3) Bootstrapping doesn't find the ks/cf that should have received from the other node.
> 4) Queries at this point cause NPEs, until when later they "recover" but data is missed.
> The problem seems to be caused by a race condition between the migration manager and
the bootstrapper, with the former running after the latter.
> I think this is supposed to protect against such scenarios:
> {noformat}
>             while (!MigrationManager.isReadyForBootstrap())
>             {
>                 setMode(Mode.JOINING, "waiting for schema information to complete", true);
>                 Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
>             }
> {noformat}
> But MigrationManager.isReadyForBootstrap() implementation is quite fragile and doesn't
take into account "slow" schema propagation.

This message was sent by Atlassian JIRA

View raw message