Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23CBB18BC7 for ; Wed, 27 Jan 2016 09:37:40 +0000 (UTC) Received: (qmail 36880 invoked by uid 500); 27 Jan 2016 09:37:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 36850 invoked by uid 500); 27 Jan 2016 09:37:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36838 invoked by uid 99); 27 Jan 2016 09:37:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jan 2016 09:37:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id C433B2C1F54 for ; Wed, 27 Jan 2016 09:37:39 +0000 (UTC) Date: Wed, 27 Jan 2016 09:37:39 +0000 (UTC) From: "Benjamin Lerer (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-10010) Paging on DISTINCT queries repeats result when first row in partition changes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-10010: --------------------------------------- Attachment: 10010-2.2.txt |[utest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-10010-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-10010-dtest/] The problem seems to be only affecting range queries. The patch makes sure that for distinct range queries only the partition keys are compared and not the rows. PR for the DTest is [here|https://github.com/riptano/cassandra-dtest/pull/774]. It adds some testing for multi partition queries to the original test of [~thobbs] > Paging on DISTINCT queries repeats result when first row in partition changes > ----------------------------------------------------------------------------- > > Key: CASSANDRA-10010 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10010 > Project: Cassandra > Issue Type: Bug > Reporter: Tyler Hobbs > Assignee: Benjamin Lerer > Priority: Minor > Fix For: 2.1.x, 2.2.x > > Attachments: 10010-2.2.txt > > > When paging, we always check new pages to see if they start with the same row that the previous page ended with, and if so, we trim that row to avoid duplicates. With {{DISTINCT}} queries, we only fetch the first row in each partition. If that row happens to change (it's deleted, or another row is inserted at the front of the partition) in between fetching the two pages, our check for a matching row will fail, resulting in a duplicate row being returned. > It seems like the correct fix is to handle {{DISTINCT}} queries specially and only check to see if the partition key matches the last returned one instead checking that the rows match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)