Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dnallsopp@gmail.com designates
 74.125.82.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALdd-zidNJLW8Gt8BBceXAtskDt=6pZ+9oibCff8xce+HJ=HHw@mail.gmail.com>
References: 
 <CAHr-TSMBr28321WpfJq7npuxgWzMXemfe1oR8=RcnR0XNo_z1w@mail.gmail.com>
	<CALdd-zidNJLW8Gt8BBceXAtskDt=6pZ+9oibCff8xce+HJ=HHw@mail.gmail.com>
Date: Thu, 28 Jul 2011 15:53:57 +0100
Message-ID: 
 <CAHr-TSPJARa1-BhKagGybObGKVNpKUSymBT2CY1ynEfv2kDX0Q@mail.gmail.com>
Subject: Re: NotFoundException thrown for get(), but not get_slice() with a
 column_names predicate
From: David Allsopp <dnallsopp@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016e64c1f74a1958104a92255a4

--0016e64c1f74a1958104a92255a4
Content-Type: text/plain; charset=ISO-8859-1

I understand and agree for the case where the slice predicate is a range,
but I'd expect the semantics to be different where the predicate is a list
of column names (even if it's implemented using a range operation under the
hood?)

If I ask for columns "foo" and "bar", then usually I'm not trying to find
out what's in a particular range - I actually want columns "foo" AND "bar",
i.e. the semantics are basically those of a set of individual column get()
calls.

I could do these as individual get() calls, but want to minimise
round-trips.

I can of course check what column were returned and try again or give up,
but this pushes work to the clients; in the worst case this could transfer
large amounts of unusable data back to the client, which then has to discard
it all (and perhaps retry and discard all over again) due to the absence of
one small column. It would save a lot of bandwidth to abandon the operation
immediately at the server if a 'missing' column is detected there.

Of course, in some use cases one might want to get whichever of the columns
names happen to exist ("foo" AND/OR "bar"), hence my suggestion that it
should be possible to choose between these two semantics when using a
column_names predicate (clearly, this doesn't make sense for a slice_range
predicate).

On 28 July 2011 13:45, Jonathan Ellis <jbellis@gmail.com> wrote:

> No, the slice semantics are "give me whatever happens to exist between
> start and end."  It's valid for the answer to be "nothing."
>
> On Thu, Jul 28, 2011 at 6:55 AM, David Allsopp <dnallsopp@gmail.com>
> wrote:
> > If I try to retrieve a column that is not present, using get(), then I'll
> > get a NotFoundException.
> >
> > If (for efficiency's sake) I try to retrieve several named columns using
> > get_slice, with a column_names predicate (i.e. a list of columns) then I
> > won't get the exception if one of those columns is missing, I think?
> >
> > This seems inconsistent - would it make sense for get_slice to throw the
> > exception too, or perhaps have an option to require all columns to be
> > present?
> >
> >
> > The reason this came up is that I write and read with CL.ONE, and retry
> at
> > the client side in case of (very occasional) failures, with the aim of
> > improving availability and performance by avoiding CL.QUORUM etc.
> > This is easy in the get() case - I can just retry a few times if I get a
> > NotFoundException. I normally only need to retry once, in less than 0.1%
> of
> > cases.
> >
> > For the get_slice case I'd need to retrieve all the columns again (might
> be
> > wasteful) or check which ones were returned and form a new request (seems
> > overly complex) or give up using get_slice and just use individual get()
> > calls (seems inefficient).
> >
> > See also https://issues.apache.org/jira/browse/CASSANDRA-518
> >
> > Thanks,
> >
> > David.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

--0016e64c1f74a1958104a92255a4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I understand and agree for the case where the slice predicate is a range, b=
ut I&#39;d expect the semantics to be different where the predicate is a li=
st of column names (even if it&#39;s implemented using a range operation un=
der the hood?)<br>
<br>If I ask for columns &quot;foo&quot; and &quot;bar&quot;, then usually =
I&#39;m not trying to find out what&#39;s in a particular range - I actuall=
y want columns &quot;foo&quot; AND &quot;bar&quot;, i.e. the semantics are =
basically those of a set of individual column get() calls.<br>
<br>I could do these as individual get() calls, but want to minimise round-=
trips.<br><br>I can of course check what column were returned and try again=
 or give up, but this pushes work to the clients; in the worst case this co=
uld transfer large amounts of unusable data back to the client, which then =
has to discard it all (and perhaps retry and discard all over again) due to=
 the absence of one small column. It would save a lot of bandwidth to aband=
on the operation immediately at the server if a &#39;missing&#39; column is=
 detected there.<br>
<br>Of course, in some use cases one might want to get whichever of the col=
umns names happen to exist (&quot;foo&quot; AND/OR &quot;bar&quot;), hence =
my suggestion that it should be possible to choose between these two semant=
ics when using a column_names predicate (clearly, this doesn&#39;t make sen=
se for a slice_range predicate).<br>
<br><div class=3D"gmail_quote">On 28 July 2011 13:45, Jonathan Ellis <span =
dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com">jbellis@gmail.com</a>&=
gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0pt=
 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1e=
x;">
No, the slice semantics are &quot;give me whatever happens to exist between=
<br>
start and end.&quot; =A0It&#39;s valid for the answer to be &quot;nothing.&=
quot;<br>
<div><div></div><div class=3D"h5"><br>
On Thu, Jul 28, 2011 at 6:55 AM, David Allsopp &lt;<a href=3D"mailto:dnalls=
opp@gmail.com">dnallsopp@gmail.com</a>&gt; wrote:<br>
&gt; If I try to retrieve a column that is not present, using get(), then I=
&#39;ll<br>
&gt; get a NotFoundException.<br>
&gt;<br>
&gt; If (for efficiency&#39;s sake) I try to retrieve several named columns=
 using<br>
&gt; get_slice, with a column_names predicate (i.e. a list of columns) then=
 I<br>
&gt; won&#39;t get the exception if one of those columns is missing, I thin=
k?<br>
&gt;<br>
&gt; This seems inconsistent - would it make sense for get_slice to throw t=
he<br>
&gt; exception too, or perhaps have an option to require all columns to be<=
br>
&gt; present?<br>
&gt;<br>
&gt;<br>
&gt; The reason this came up is that I write and read with CL.ONE, and retr=
y at<br>
&gt; the client side in case of (very occasional) failures, with the aim of=
<br>
&gt; improving availability and performance by avoiding CL.QUORUM etc.<br>
&gt; This is easy in the get() case - I can just retry a few times if I get=
 a<br>
&gt; NotFoundException. I normally only need to retry once, in less than 0.=
1% of<br>
&gt; cases.<br>
&gt;<br>
&gt; For the get_slice case I&#39;d need to retrieve all the columns again =
(might be<br>
&gt; wasteful) or check which ones were returned and form a new request (se=
ems<br>
&gt; overly complex) or give up using get_slice and just use individual get=
()<br>
&gt; calls (seems inefficient).<br>
&gt;<br>
&gt; See also <a href=3D"https://issues.apache.org/jira/browse/CASSANDRA-51=
8" target=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-518</a=
><br>
&gt;<br>
&gt; Thanks,<br>
&gt;<br>
&gt; David.<br>
&gt;<br>
<br>
<br>
<br>
</div></div><font color=3D"#888888">--<br>
Jonathan Ellis<br>
Project Chair, Apache Cassandra<br>
co-founder of DataStax, the source for professional Cassandra support<br>
<a href=3D"http://www.datastax.com" target=3D"_blank">http://www.datastax.c=
om</a><br>
</font></blockquote></div><br>

--0016e64c1f74a1958104a92255a4--