Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5D466200C09 for ; Wed, 11 Jan 2017 00:04:01 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5BF54160B3D; Tue, 10 Jan 2017 23:04:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A3460160B4B for ; Wed, 11 Jan 2017 00:04:00 +0100 (CET) Received: (qmail 89730 invoked by uid 500); 10 Jan 2017 23:03:58 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 89709 invoked by uid 99); 10 Jan 2017 23:03:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2017 23:03:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BECC62C03DC for ; Tue, 10 Jan 2017 23:03:58 +0000 (UTC) Date: Tue, 10 Jan 2017 23:03:58 +0000 (UTC) From: "Xiaolong Jiang (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-13115) Read repair is not blocking repair to finish in foreground repair MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 10 Jan 2017 23:04:01 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaolong Jiang updated CASSANDRA-13115: --------------------------------------- Description: The code trying to wait(block) for repair result to come back in 3.X is below: {{public void close() { try { FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout()); } catch (TimeoutException ex) { // We got all responses, but timed out while repairing int blockFor = consistency.blockFor(keyspace); if (Tracing.isTracing()) Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor); else logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor); throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); } } }} in DataResolver class, but this close method is never called and it's also not auto close(RepairMergeListener is not extending from AutoCloseable/CloseableIterator) which means we never wait for repair to finish before returning final result. The steps to reproduce: 1. create some keyspace/table with RF = 2 2. start 2 nodes using ccm 3. stop node2 4. disable node1 hinted hand off 5. write some data to node1 with consistency level one 6. start node2 7. query some data from node1 This should trigger read repair. I put some log in above close method, and can not see log print put. So this bug will basically violate "monotonic quorum reads " guarantee. was: The code trying to wait(block) for repair result to come back in 3.X is below: public void close() { try { FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout()); } catch (TimeoutException ex) { // We got all responses, but timed out while repairing int blockFor = consistency.blockFor(keyspace); if (Tracing.isTracing()) Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor); else logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor); throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); } } in DataResolver class, but this close method is never called and it's also not auto close(RepairMergeListener is not extending from AutoCloseable/CloseableIterator) which means we never wait for repair to finish before returning final result. The steps to reproduce: 1. create some keyspace/table with RF = 2 2. start 2 nodes using ccm 3. stop node2 4. disable node1 hinted hand off 5. write some data to node1 with consistency level one 6. start node2 7. query some data from node1 This should trigger read repair. I put some log in above close method, and can not see log print put. So this bug will basically violate "monotonic quorum reads " guarantee. > Read repair is not blocking repair to finish in foreground repair > ----------------------------------------------------------------- > > Key: CASSANDRA-13115 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13115 > Project: Cassandra > Issue Type: Bug > Environment: ccm on OSX > Reporter: Xiaolong Jiang > > The code trying to wait(block) for repair result to come back in 3.X is below: > {{public void close() > { > try > { > FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout()); > } > catch (TimeoutException ex) > { > // We got all responses, but timed out while repairing > int blockFor = consistency.blockFor(keyspace); > if (Tracing.isTracing()) > Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor); > else > logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor); > throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true); > } > } > }} > in DataResolver class, but this close method is never called and it's also not auto close(RepairMergeListener is not extending from AutoCloseable/CloseableIterator) which means we never wait for repair to finish before returning final result. > The steps to reproduce: > 1. create some keyspace/table with RF = 2 > 2. start 2 nodes using ccm > 3. stop node2 > 4. disable node1 hinted hand off > 5. write some data to node1 with consistency level one > 6. start node2 > 7. query some data from node1 > This should trigger read repair. I put some log in above close method, and can not see log print put. > So this bug will basically violate "monotonic quorum reads " guarantee. -- This message was sent by Atlassian JIRA (v6.3.4#6332)