Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <r2rd9d57b0c1005060917z71112bc9rb399b6c17c9ef45f@mail.gmail.com>
References: <r2rd9d57b0c1005060917z71112bc9rb399b6c17c9ef45f@mail.gmail.com>
Date: Thu, 6 May 2010 14:56:41 -0700
Message-ID: <n2k10e230a81005061456u49ce582al83ef54f36aa55942@mail.gmail.com>
Subject: Re: pagination through slices with deleted keys
From: Mike Malone <mike@simplegeo.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636b2af6382d3610485f4048a

--001636b2af6382d3610485f4048a
Content-Type: text/plain; charset=ISO-8859-1

Our solution at SimpleGeo has been to hack Cassandra to (optionally, at
least) be sensible and drop Rows that don't have any Columns. The claim from
the FAQ that "Cassandra would have to check if there are any other columns
in the row" is inaccurate. The common case for us at least is that we're
only interested in Rows that have Columns matching our predicate. So if
there aren't any, we just don't return that row. No need to check if the
entire row is deleted.

Mike

On Thu, May 6, 2010 at 9:17 AM, Ian Kallen <spidaman.list@gmail.com> wrote:

> I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
> which do a good job describing how difficult deletion is in an eventually
> consistent system. But practical application strategies for dealing with it
> aren't there (that I saw). I'm wondering how folks implement pagination in
> their applications; if you want to render N results in an application, is
> the only solution to over-fetch and filter out the tombstones? Or is there
> something simpler that I overlooked? I'd like to be able to count (even if
> the counts are approximate) and fetch rows with the deleted ones filtered
> out (without waiting for the GCGraceSeconds interval + compaction) but from
> what I see so far, the burden is on the app to deal with the tombstones.
> -Ian
>

--001636b2af6382d3610485f4048a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Our solution at SimpleGeo has been to hack Cassandra to (optionally, at lea=
st) be sensible and drop Rows that don&#39;t have any Columns. The claim fr=
om the FAQ that &quot;Cassandra would have to check if there are any other =
columns in the row&quot; is inaccurate. The common case for us at least is =
that we&#39;re only interested in Rows that have Columns matching our predi=
cate. So if there aren&#39;t any, we just don&#39;t return that row. No nee=
d to check if the entire row is deleted.<div>
<br></div><div>Mike<br><br><div class=3D"gmail_quote">On Thu, May 6, 2010 a=
t 9:17 AM, Ian Kallen <span dir=3D"ltr">&lt;<a href=3D"mailto:spidaman.list=
@gmail.com">spidaman.list@gmail.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;p=
adding-left:1ex;">
I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki wh=
ich do a good job describing how difficult deletion is in an eventually con=
sistent system. But practical application strategies for dealing with it ar=
en&#39;t there (that I saw). I&#39;m wondering how folks implement paginati=
on in their applications; if you want to render N results in an application=
, is the only solution to over-fetch and filter out the tombstones? Or is t=
here something simpler that I overlooked? I&#39;d like to be able to count =
(even if the counts are approximate) and fetch rows with the deleted ones f=
iltered out (without waiting for the GCGraceSeconds interval + compaction) =
but from what I see so far, the burden is on the app to deal with the tombs=
tones.<br>
<font color=3D"#888888">
-Ian<br>
</font></blockquote></div><br></div>

--001636b2af6382d3610485f4048a--