cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: How does Cassandra optimize this query?
Date Mon, 05 Nov 2012 20:27:26 GMT
> A remark like "maybe we just shouldn't allow that and leave that to the
> map-reduce side" would make sense, but I don't see how this is "misleading".

Yes. Bingo.

It is misleading because it is not useful in any other context besides
someone playing around with a ten row table in cqlsh. CQL stops me
from executing some queries that are not efficient, yet it allows this
one. If I am new to Cassandra and developing, this query works and
produces a result then once my database gets real data produces a
different result (likely an empty one).

When I first saw this query two things came to my mind.

1) CQL (and Cassandra) must be somehow indexing all the fields of a
primary key to make this search optimal.

2) This is impossible CQL must be gathering the first hundred random
rows and finding this thing.

What it is happening is case #2. In a nutshell CQL is just sampling
some data and running the query on it. We could support all types of
query constructs if we just take the first 100 rows and apply this
logic to it, but these things are not helpful for anything but light
ad-hoc data exploration.

My suggestions:
1) force people to supply a LIMIT clause on any query that is going to
page over get_range_slice
2) having some type of explain support so I can establish if this
query will work in the

I say this because as an end user I do not understand if a given query
is actually going to return the same results with different data.

On Mon, Nov 5, 2012 at 1:40 PM, Sylvain Lebresne <sylvain@datastax.com> wrote:
>
> On Mon, Nov 5, 2012 at 6:55 PM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
>>
>> I see. It is fairly misleading because it is a query that does not
>> work at scale. This syntax is only helpful if you have less then a few
>> thousand rows in Cassandra.
>
>
> Just for the sake of argument, how is that misleading? If you have billions
> of rows and do the select statement from you initial mail, what did the
> syntax lead you to believe it would return?
>
> A remark like "maybe we just shouldn't allow that and leave that to the
> map-reduce side" would make sense, but I don't see how this is "misleading".
>
> But again, this translate directly to a get_range_slice (that don't scale if
> you have billion of rows and don't limit the output either) so there is
> nothing new here.

Mime
View raw message