cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: How does Cassandra optimize this query?
Date Mon, 05 Nov 2012 21:42:56 GMT
I created https://issues.apache.org/jira/browse/CASSANDRA-4915

On Mon, Nov 5, 2012 at 3:27 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>> A remark like "maybe we just shouldn't allow that and leave that to the
>> map-reduce side" would make sense, but I don't see how this is "misleading".
>
> Yes. Bingo.
>
> It is misleading because it is not useful in any other context besides
> someone playing around with a ten row table in cqlsh. CQL stops me
> from executing some queries that are not efficient, yet it allows this
> one. If I am new to Cassandra and developing, this query works and
> produces a result then once my database gets real data produces a
> different result (likely an empty one).
>
> When I first saw this query two things came to my mind.
>
> 1) CQL (and Cassandra) must be somehow indexing all the fields of a
> primary key to make this search optimal.
>
> 2) This is impossible CQL must be gathering the first hundred random
> rows and finding this thing.
>
> What it is happening is case #2. In a nutshell CQL is just sampling
> some data and running the query on it. We could support all types of
> query constructs if we just take the first 100 rows and apply this
> logic to it, but these things are not helpful for anything but light
> ad-hoc data exploration.
>
> My suggestions:
> 1) force people to supply a LIMIT clause on any query that is going to
> page over get_range_slice
> 2) having some type of explain support so I can establish if this
> query will work in the
>
> I say this because as an end user I do not understand if a given query
> is actually going to return the same results with different data.
>
> On Mon, Nov 5, 2012 at 1:40 PM, Sylvain Lebresne <sylvain@datastax.com> wrote:
>>
>> On Mon, Nov 5, 2012 at 6:55 PM, Edward Capriolo <edlinuxguru@gmail.com>
>> wrote:
>>>
>>> I see. It is fairly misleading because it is a query that does not
>>> work at scale. This syntax is only helpful if you have less then a few
>>> thousand rows in Cassandra.
>>
>>
>> Just for the sake of argument, how is that misleading? If you have billions
>> of rows and do the select statement from you initial mail, what did the
>> syntax lead you to believe it would return?
>>
>> A remark like "maybe we just shouldn't allow that and leave that to the
>> map-reduce side" would make sense, but I don't see how this is "misleading".
>>
>> But again, this translate directly to a get_range_slice (that don't scale if
>> you have billion of rows and don't limit the output either) so there is
>> nothing new here.

Mime
View raw message