incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Allsopp <dnalls...@gmail.com>
Subject Re: NotFoundException thrown for get(), but not get_slice() with a column_names predicate
Date Thu, 28 Jul 2011 14:53:57 GMT
I understand and agree for the case where the slice predicate is a range,
but I'd expect the semantics to be different where the predicate is a list
of column names (even if it's implemented using a range operation under the
hood?)

If I ask for columns "foo" and "bar", then usually I'm not trying to find
out what's in a particular range - I actually want columns "foo" AND "bar",
i.e. the semantics are basically those of a set of individual column get()
calls.

I could do these as individual get() calls, but want to minimise
round-trips.

I can of course check what column were returned and try again or give up,
but this pushes work to the clients; in the worst case this could transfer
large amounts of unusable data back to the client, which then has to discard
it all (and perhaps retry and discard all over again) due to the absence of
one small column. It would save a lot of bandwidth to abandon the operation
immediately at the server if a 'missing' column is detected there.

Of course, in some use cases one might want to get whichever of the columns
names happen to exist ("foo" AND/OR "bar"), hence my suggestion that it
should be possible to choose between these two semantics when using a
column_names predicate (clearly, this doesn't make sense for a slice_range
predicate).

On 28 July 2011 13:45, Jonathan Ellis <jbellis@gmail.com> wrote:

> No, the slice semantics are "give me whatever happens to exist between
> start and end."  It's valid for the answer to be "nothing."
>
> On Thu, Jul 28, 2011 at 6:55 AM, David Allsopp <dnallsopp@gmail.com>
> wrote:
> > If I try to retrieve a column that is not present, using get(), then I'll
> > get a NotFoundException.
> >
> > If (for efficiency's sake) I try to retrieve several named columns using
> > get_slice, with a column_names predicate (i.e. a list of columns) then I
> > won't get the exception if one of those columns is missing, I think?
> >
> > This seems inconsistent - would it make sense for get_slice to throw the
> > exception too, or perhaps have an option to require all columns to be
> > present?
> >
> >
> > The reason this came up is that I write and read with CL.ONE, and retry
> at
> > the client side in case of (very occasional) failures, with the aim of
> > improving availability and performance by avoiding CL.QUORUM etc.
> > This is easy in the get() case - I can just retry a few times if I get a
> > NotFoundException. I normally only need to retry once, in less than 0.1%
> of
> > cases.
> >
> > For the get_slice case I'd need to retrieve all the columns again (might
> be
> > wasteful) or check which ones were returned and form a new request (seems
> > overly complex) or give up using get_slice and just use individual get()
> > calls (seems inefficient).
> >
> > See also https://issues.apache.org/jira/browse/CASSANDRA-518
> >
> > Thanks,
> >
> > David.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message