Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Thu, 12 Jan 2017 09:34:51 +0000 (UTC)
From: "Sylvain Lebresne (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.13033546.1484089216000.776.1484213691975@Atlassian.JIRA>
In-Reply-To: <JIRA.13033546.1484089216000@Atlassian.JIRA>
References: <JIRA.13033546.1484089216000@Atlassian.JIRA> <JIRA.13033546.1484089216285@arcas>
Subject: [jira] [Updated] (CASSANDRA-13115) Read repair is not blocking
 repair to finish in foreground repair
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 12 Jan 2017 09:34:53 -0000


     [ https://issues.apache.org/jira/browse/CASSANDRA-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-13115:
-----------------------------------------
         Assignee: Sylvain Lebresne
    Fix Version/s: 3.x
                   3.0.x
           Status: Patch Available  (was: Open)

You're absolutely right, this is clearly wrong, thanks for noticing.

I'm attaching the pretty trivial fix. Now, while looking at this, I realized we were also not handling "asynchronous read repairs" properly as we were not consuming the result of {{resolve()}} in that case. So a 2nd commit fixes that part (also fairly trivial).
| [13115-3.0|https://github.com/pcmanus/cassandra/commits/13115-3.0] | [utests|http://cassci.datastax.com/job/pcmanus-13115-3.0-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-13115-3.0-dtest] |
| [13115-3.X|https://github.com/pcmanus/cassandra/commits/13115-3.X] | [utests|http://cassci.datastax.com/job/pcmanus-13115-3.X-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-13115-3.X-dtest] |


> Read repair is not blocking repair to finish in foreground repair
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-13115
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13115
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: ccm on OSX 
>            Reporter: Xiaolong Jiang
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0.x, 3.x
>
>
> The code trying to wait(block) for repair result to come back in 3.X is below:
> {code:title= DataResolver.java|borderStyle=solid}
> public void close()
>         {
>             try
>             {
>                 FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout());
>             }
>             catch (TimeoutException ex)
>             {
>                 // We got all responses, but timed out while repairing
>                 int blockFor = consistency.blockFor(keyspace);
>                 if (Tracing.isTracing())
>                     Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor);
>                 else
>                     logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor);
>                 throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
>             }
>         }
> {code}
> in DataResolver class, but this close method is never called and it's also not auto close(RepairMergeListener is not extending from AutoCloseable/CloseableIterator) which means we never wait for repair to finish before returning final result. 
> The steps to reproduce:
> 1. create some keyspace/table with RF = 2
> 2. start 2 nodes using ccm
> 3. stop node2
> 4. disable node1 hinted hand off
> 5. write some data to node1 with consistency level one
> 6. start node2
> 7. query some data from node1 
> This should trigger read repair. I put some log in above close method, and can not see log print put.
> So this bug will basically violate "monotonic quorum reads " guarantee. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)