Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2278F73C2 for ; Thu, 28 Jul 2011 14:54:27 +0000 (UTC) Received: (qmail 88487 invoked by uid 500); 28 Jul 2011 14:54:25 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88437 invoked by uid 500); 28 Jul 2011 14:54:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88429 invoked by uid 99); 28 Jul 2011 14:54:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jul 2011 14:54:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dnallsopp@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jul 2011 14:54:17 +0000 Received: by wyj26 with SMTP id 26so111590wyj.31 for ; Thu, 28 Jul 2011 07:53:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=M405+7fb6WPiTVVWe9ZIM7hYeU6y5WOCKMbUjgVWAus=; b=tFuVmC2THAPdYRI1wjWvMaNwEyM1Z0O7CrGCVaktayJ9Apuu1KVIX60yGRaC9ZgRWz Z4gHlQ7Bc3/+LY0pGLZveTlxOQw7ldJnMIiygYpWO3DwSFmZJdcXb4pGWCbYDzZxqCiO exo+ZYXHGebDQpGj3S/5l7A4dIGY+7AWiP640= MIME-Version: 1.0 Received: by 10.227.151.196 with SMTP id d4mr102041wbw.102.1311864837477; Thu, 28 Jul 2011 07:53:57 -0700 (PDT) Received: by 10.216.168.209 with HTTP; Thu, 28 Jul 2011 07:53:57 -0700 (PDT) In-Reply-To: References: Date: Thu, 28 Jul 2011 15:53:57 +0100 Message-ID: Subject: Re: NotFoundException thrown for get(), but not get_slice() with a column_names predicate From: David Allsopp To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e64c1f74a1958104a92255a4 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64c1f74a1958104a92255a4 Content-Type: text/plain; charset=ISO-8859-1 I understand and agree for the case where the slice predicate is a range, but I'd expect the semantics to be different where the predicate is a list of column names (even if it's implemented using a range operation under the hood?) If I ask for columns "foo" and "bar", then usually I'm not trying to find out what's in a particular range - I actually want columns "foo" AND "bar", i.e. the semantics are basically those of a set of individual column get() calls. I could do these as individual get() calls, but want to minimise round-trips. I can of course check what column were returned and try again or give up, but this pushes work to the clients; in the worst case this could transfer large amounts of unusable data back to the client, which then has to discard it all (and perhaps retry and discard all over again) due to the absence of one small column. It would save a lot of bandwidth to abandon the operation immediately at the server if a 'missing' column is detected there. Of course, in some use cases one might want to get whichever of the columns names happen to exist ("foo" AND/OR "bar"), hence my suggestion that it should be possible to choose between these two semantics when using a column_names predicate (clearly, this doesn't make sense for a slice_range predicate). On 28 July 2011 13:45, Jonathan Ellis wrote: > No, the slice semantics are "give me whatever happens to exist between > start and end." It's valid for the answer to be "nothing." > > On Thu, Jul 28, 2011 at 6:55 AM, David Allsopp > wrote: > > If I try to retrieve a column that is not present, using get(), then I'll > > get a NotFoundException. > > > > If (for efficiency's sake) I try to retrieve several named columns using > > get_slice, with a column_names predicate (i.e. a list of columns) then I > > won't get the exception if one of those columns is missing, I think? > > > > This seems inconsistent - would it make sense for get_slice to throw the > > exception too, or perhaps have an option to require all columns to be > > present? > > > > > > The reason this came up is that I write and read with CL.ONE, and retry > at > > the client side in case of (very occasional) failures, with the aim of > > improving availability and performance by avoiding CL.QUORUM etc. > > This is easy in the get() case - I can just retry a few times if I get a > > NotFoundException. I normally only need to retry once, in less than 0.1% > of > > cases. > > > > For the get_slice case I'd need to retrieve all the columns again (might > be > > wasteful) or check which ones were returned and form a new request (seems > > overly complex) or give up using get_slice and just use individual get() > > calls (seems inefficient). > > > > See also https://issues.apache.org/jira/browse/CASSANDRA-518 > > > > Thanks, > > > > David. > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > --0016e64c1f74a1958104a92255a4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I understand and agree for the case where the slice predicate is a range, b= ut I'd expect the semantics to be different where the predicate is a li= st of column names (even if it's implemented using a range operation un= der the hood?)

If I ask for columns "foo" and "bar", then usually = I'm not trying to find out what's in a particular range - I actuall= y want columns "foo" AND "bar", i.e. the semantics are = basically those of a set of individual column get() calls.

I could do these as individual get() calls, but want to minimise round-= trips.

I can of course check what column were returned and try again= or give up, but this pushes work to the clients; in the worst case this co= uld transfer large amounts of unusable data back to the client, which then = has to discard it all (and perhaps retry and discard all over again) due to= the absence of one small column. It would save a lot of bandwidth to aband= on the operation immediately at the server if a 'missing' column is= detected there.

Of course, in some use cases one might want to get whichever of the col= umns names happen to exist ("foo" AND/OR "bar"), hence = my suggestion that it should be possible to choose between these two semant= ics when using a column_names predicate (clearly, this doesn't make sen= se for a slice_range predicate).

On 28 July 2011 13:45, Jonathan Ellis <jbellis@gmail.com&= gt; wrote:
No, the slice semantics are "give me whatever happens to exist between=
start and end." =A0It's valid for the answer to be "nothing.&= quot;

On Thu, Jul 28, 2011 at 6:55 AM, David Allsopp <dnallsopp@gmail.com> wrote:
> If I try to retrieve a column that is not present, using get(), then I= 'll
> get a NotFoundException.
>
> If (for efficiency's sake) I try to retrieve several named columns= using
> get_slice, with a column_names predicate (i.e. a list of columns) then= I
> won't get the exception if one of those columns is missing, I thin= k?
>
> This seems inconsistent - would it make sense for get_slice to throw t= he
> exception too, or perhaps have an option to require all columns to be<= br> > present?
>
>
> The reason this came up is that I write and read with CL.ONE, and retr= y at
> the client side in case of (very occasional) failures, with the aim of=
> improving availability and performance by avoiding CL.QUORUM etc.
> This is easy in the get() case - I can just retry a few times if I get= a
> NotFoundException. I normally only need to retry once, in less than 0.= 1% of
> cases.
>
> For the get_slice case I'd need to retrieve all the columns again = (might be
> wasteful) or check which ones were returned and form a new request (se= ems
> overly complex) or give up using get_slice and just use individual get= ()
> calls (seems inefficient).
>
> See also https://issues.apache.org/jira/browse/CASSANDRA-518
>
> Thanks,
>
> David.
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.c= om

--0016e64c1f74a1958104a92255a4--