Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4F1F7173F3 for ; Mon, 9 Mar 2015 12:16:40 +0000 (UTC) Received: (qmail 13040 invoked by uid 500); 9 Mar 2015 12:16:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 12989 invoked by uid 500); 9 Mar 2015 12:16:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 12893 invoked by uid 99); 9 Mar 2015 12:16:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2015 12:16:39 +0000 Date: Mon, 9 Mar 2015 12:16:39 +0000 (UTC) From: "mck (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8574) Gracefully degrade SELECT when there are lots of tombstones MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8574?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14352873#comment-14352873 ]=20 mck commented on CASSANDRA-8574: -------------------------------- > The problem with both of these so far, is that a single partition key wi= th too many tombstones can make the query job fail hard. Is the problem purely the tombstones, or could it be that tombstones increa= se occurrence of short reads? *If* short reads is the underlying problem then there is a possible substan= tial improvement=C2=B9 in the code by (instead of having to completely retr= y each short read with a larger pager value) being able to indicate from St= orageProxy back to the pager that this is a short read (opposed to end of r= esults) and the pager should continue. =C2=B9 say for example a 10k row is selected and each query with page size= 100 is short and requires an additional read you're going to be running on= e hundred extra queries (double load on the cluster). being able to indicat= e back to the pager it isn't yet exhausted would mean only one extra query = on the tail end. > Gracefully degrade SELECT when there are lots of tombstones > ----------------------------------------------------------- > > Key: CASSANDRA-8574 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8574 > Project: Cassandra > Issue Type: Improvement > Reporter: Jens Rantil > Fix For: 3.0 > > > *Background:* There's lots of tooling out there to do BigData analysis on= Cassandra clusters. Examples are Spark and Hadoop, which is offered by DSE= . The problem with both of these so far, is that a single partition key wit= h too many tombstones can make the query job fail hard. > The described scenario happens despite the user setting a rather small Fe= tchSize. I assume this is a common scenario if you have larger rows. > *Proposal:* To allow a CQL SELECT to gracefully degrade to only return a = smaller batch of results if there are too many tombstones. The tombstones a= re ordered according to clustering key and one should be able to page throu= gh them. Potentially: > SELECT * FROM mytable LIMIT 1000 TOMBSTONES; > would page through maximum 1000 tombstones, _or_ 1000 (CQL) rows. > I understand that this obviously would degrade performance, but it would = at least yield a result. > *Additional comment:* I haven't dug into Cassandra code, but conceptually= I guess this would be doable. Let me know what you think. -- This message was sent by Atlassian JIRA (v6.3.4#6332)