Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Mon, 9 Mar 2015 12:16:39 +0000 (UTC)
From: "mck (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12765401.1420647550000.36648.1425903399654@Atlassian.JIRA>
In-Reply-To: <JIRA.12765401.1420647550000@Atlassian.JIRA>
References: <JIRA.12765401.1420647550000@Atlassian.JIRA>
 <JIRA.12765401.1420647550984@arcas>
Subject: [jira] [Commented] (CASSANDRA-8574) Gracefully degrade SELECT when
 there are lots of tombstones
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/CASSANDRA-8574?page=3Dcom.atlas=
sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D=
14352873#comment-14352873 ]=20

mck commented on CASSANDRA-8574:
--------------------------------

>  The problem with both of these so far, is that a single partition key wi=
th too many tombstones can make the query job fail hard.

Is the problem purely the tombstones, or could it be that tombstones increa=
se occurrence of short reads?

*If* short reads is the underlying problem then there is a possible substan=
tial improvement=C2=B9 in the code by (instead of having to completely retr=
y each short read with a larger pager value) being able to indicate from St=
orageProxy back to the pager that this is a short read (opposed to end of r=
esults) and the pager should continue.

 =C2=B9 say for example a 10k row is selected and each query with page size=
 100 is short and requires an additional read you're going to be running on=
e hundred extra queries (double load on the cluster). being able to indicat=
e back to the pager it isn't yet exhausted would mean only one extra query =
on the tail end.

> Gracefully degrade SELECT when there are lots of tombstones
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-8574
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8574
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jens Rantil
>             Fix For: 3.0
>
>
> *Background:* There's lots of tooling out there to do BigData analysis on=
 Cassandra clusters. Examples are Spark and Hadoop, which is offered by DSE=
. The problem with both of these so far, is that a single partition key wit=
h too many tombstones can make the query job fail hard.
> The described scenario happens despite the user setting a rather small Fe=
tchSize. I assume this is a common scenario if you have larger rows.
> *Proposal:* To allow a CQL SELECT to gracefully degrade to only return a =
smaller batch of results if there are too many tombstones. The tombstones a=
re ordered according to clustering key and one should be able to page throu=
gh them. Potentially:
>     SELECT * FROM mytable LIMIT 1000 TOMBSTONES;
> would page through maximum 1000 tombstones, _or_ 1000 (CQL) rows.
> I understand that this obviously would degrade performance, but it would =
at least yield a result.
> *Additional comment:* I haven't dug into Cassandra code, but conceptually=
 I guess this would be doable. Let me know what you think.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)