Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A2AC0200CE6 for ; Fri, 15 Sep 2017 21:30:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A15971609D3; Fri, 15 Sep 2017 19:30:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C55E51609D1 for ; Fri, 15 Sep 2017 21:30:07 +0200 (CEST) Received: (qmail 80135 invoked by uid 500); 15 Sep 2017 19:30:06 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 80124 invoked by uid 99); 15 Sep 2017 19:30:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Sep 2017 19:30:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 24393C3EEB for ; Fri, 15 Sep 2017 19:30:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -98.951 X-Spam-Level: X-Spam-Status: No, score=-98.951 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LOTSOFHASH=0.25, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IDikWsfQ8btO for ; Fri, 15 Sep 2017 19:30:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1084C5FDBC for ; Fri, 15 Sep 2017 19:30:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3B03EE0F16 for ; Fri, 15 Sep 2017 19:30:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0412725396 for ; Fri, 15 Sep 2017 19:30:01 +0000 (UTC) Date: Fri, 15 Sep 2017 19:30:01 +0000 (UTC) From: "Aleksey Yeschenko (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-12872) Fix short read protection when more than one row is missing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 15 Sep 2017 19:30:08 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168131#comment-16168131 ] Aleksey Yeschenko edited comment on CASSANDRA-12872 at 9/15/17 7:29 PM: ------------------------------------------------------------------------ Thanks, committed as [9ea61305ec30a476f48320c06f56d8d67000bbbe|https://github.com/apache/cassandra/commit/9ea61305ec30a476f48320c06f56d8d67000bbbe] and merged with 3.11 and trunk, with grammar corrected. Or altered, anyway. dtest committed as [9119cdfe921a2f39a315badd58900a12409d506e|https://github.com/apache/cassandra-dtest/commit/9119cdfe921a2f39a315badd58900a12409d506e]. All unit tests are passing, although a few flaked out on CircleCI and had to be rerun locally. {{CommitLogSegmentManagerTest}} in 3.0, {{DeleteTest}}, {{PreparedStatementsTest}}, and {{RemoveTest}} on 3.11, and {{ViewTest}} in 4.0. Links to runs: [3.0|https://circleci.com/gh/iamaleksey/cassandra/30], [3.11|https://circleci.com/gh/iamaleksey/cassandra/31], [4.0|https://circleci.com/gh/iamaleksey/cassandra/32]. On dtests front, there is one new legit failure - in {{consistency_test.py:TestConsistency.test_13747}}. Fixing the bug with counting exposed yet another bug in SRP and potentially an orthogonal issue with the read path. It has to do with how we handle {{EMPTY}} clustering and bounds. See CASSANDRA-13880. was (Author: iamaleksey): Thanks, committed as [9ea61305ec30a476f48320c06f56d8d67000bbbe|https://github.com/apache/cassandra/commit/9ea61305ec30a476f48320c06f56d8d67000bbbe] and merged with 3.11 and trunk, with grammar corrected. Or altered, anyway. All unit tests are passing, although a few flaked out on CircleCI and had to be rerun locally. {{CommitLogSegmentManagerTest}} in 3.0, {{DeleteTest}}, {{PreparedStatementsTest}}, and {{RemoveTest}} on 3.11, and {{ViewTest}} in 4.0. Links to runs: [3.0|https://circleci.com/gh/iamaleksey/cassandra/30], [3.11|https://circleci.com/gh/iamaleksey/cassandra/31], [4.0|https://circleci.com/gh/iamaleksey/cassandra/32]. On dtests front, there is one new legit failure - in {{consistency_test.py:TestConsistency.test_13747}}. Fixing the bug with counting exposed yet another bug in SRP and potentially an orthogonal issue with the read path. It has to do with how we handle {{EMPTY}} clustering and bounds. I will elaborate very soon, and link to the new JIRA issues here. For the time being, that dtest will be failing - but I have a fix already in the works. > Fix short read protection when more than one row is missing > ----------------------------------------------------------- > > Key: CASSANDRA-12872 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12872 > Project: Cassandra > Issue Type: Bug > Components: Coordination > Reporter: Bhaskar Muppana > Assignee: Aleksey Yeschenko > Priority: Critical > Labels: Correctness > Fix For: 3.0.15, 3.11.1 > > Attachments: limiterr-reproduce.sh > > > We are seeing an issue with paging reads missing some small number of columns when we do paging/limit reads. We get this on a single DC cluster itself when both reads and writes are happening with QUORUM. Paging/limit reads see this issue. I have attached the ccm based script which reproduces the problem. > * Keyspace RF - 3 > * Table (id int, course text, marks int, primary key(id, course)) > * replicas for partition key 1 - r1, r2 and r3 > * insert (1, '1', 1) , (1, '2', 2), (1, '3', 3), (1, '4', 4), (1, '5', 5) - succeeded on all 3 replicas > * insert (1, '6', 6) succeeded on r1 and r3, failed on r2 > * delete (1, '2'), (1, '3'), (1, '4'), (1, '5') succeeded on r1 and r2, failed on r3 > * insert (1, '7', 7) succeeded on r1 and r2, failed on r3 > Local data on 3 nodes looks like as below now > r1: (1, '1', 1), tombstone(2-5 records), (1, '6', 6), (1, '7', 7) > r2: (1, '1', 1), tombstone(2-5 records), (1, '7', 7) > r3: (1, '1', 1), (1, '2', 2), (1, '3', 3), (1, '4', 4), (1, '5', 5), (1, '6', 6) > If we do a paging read with page_size 2, and if it gets data from r2 and r3, then it will only get the data (1, '1', 1) and (1, '7', 7) skipping record 6. This problem would happen if the same query is not doing paging but limit set to 2 records. > Resolution code for reads works same for paging queries and normal queries. Co-ordinator shouldn't respond back to client with records/columns that it didn't have complete visibility on all required replicas (in this case 2 replicas). In above case, it is sending back record (1, '7', 7) back to client, but its visibility on r3 is limited up to (1, '2', 2) and it is relying on just r2 data to assume (1, '6', 6) doesn't exist, which is wrong. End of the resolution all it can conclusively say any thing about is (1, '1', and the other one is that we and and and and and and the and the and the and d and the other is and 1), which exists and (1, '2', 2), which is deleted. > Ideally we should have different resolution implementation for paging/limit queries. > We could reproduce this on 2.0.17, 2.1.16 and 3.0.9. > Seems like 3.0.9 we have ShortReadProtection transformation on list queries. I assume that is to protect against the cases like above. But, we can reproduce the issue in 3.0.9 as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org