Ok, I slightly misunderstood your initial complain, my bad. I largely agree with you, though I'm more conflicted on what the right resolution is. But I'll follow up on the ticket to avoid repetition.


On Mon, Nov 5, 2012 at 10:42 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
I created https://issues.apache.org/jira/browse/CASSANDRA-4915

On Mon, Nov 5, 2012 at 3:27 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>> A remark like "maybe we just shouldn't allow that and leave that to the
>> map-reduce side" would make sense, but I don't see how this is "misleading".
>
> Yes. Bingo.
>
> It is misleading because it is not useful in any other context besides
> someone playing around with a ten row table in cqlsh. CQL stops me
> from executing some queries that are not efficient, yet it allows this
> one. If I am new to Cassandra and developing, this query works and
> produces a result then once my database gets real data produces a
> different result (likely an empty one).
>
> When I first saw this query two things came to my mind.
>
> 1) CQL (and Cassandra) must be somehow indexing all the fields of a
> primary key to make this search optimal.
>
> 2) This is impossible CQL must be gathering the first hundred random
> rows and finding this thing.
>
> What it is happening is case #2. In a nutshell CQL is just sampling
> some data and running the query on it. We could support all types of
> query constructs if we just take the first 100 rows and apply this
> logic to it, but these things are not helpful for anything but light
> ad-hoc data exploration.
>
> My suggestions:
> 1) force people to supply a LIMIT clause on any query that is going to
> page over get_range_slice
> 2) having some type of explain support so I can establish if this
> query will work in the
>
> I say this because as an end user I do not understand if a given query
> is actually going to return the same results with different data.
>
> On Mon, Nov 5, 2012 at 1:40 PM, Sylvain Lebresne <sylvain@datastax.com> wrote:
>>
>> On Mon, Nov 5, 2012 at 6:55 PM, Edward Capriolo <edlinuxguru@gmail.com>
>> wrote:
>>>
>>> I see. It is fairly misleading because it is a query that does not
>>> work at scale. This syntax is only helpful if you have less then a few
>>> thousand rows in Cassandra.
>>
>>
>> Just for the sake of argument, how is that misleading? If you have billions
>> of rows and do the select statement from you initial mail, what did the
>> syntax lead you to believe it would return?
>>
>> A remark like "maybe we just shouldn't allow that and leave that to the
>> map-reduce side" would make sense, but I don't see how this is "misleading".
>>
>> But again, this translate directly to a get_range_slice (that don't scale if
>> you have billion of rows and don't limit the output either) so there is
>> nothing new here.