Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of hkroger@gmail.com designates
 209.85.215.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAF+A=7pG4-fdiJbdhTO1NoZDK_Qnx+ri9_aNH1=ExV_GK2oxng@mail.gmail.com>
References: 
 <CAF+A=7romY+Yfbn25UF5BmDUEF4APNm5DBbV6TQV2Ka4oUJx9Q@mail.gmail.com>
	<CAN-FP2ouCzfiy74r-XpsKagyEcf-hP28o2rNLZymHOv2TpACMQ@mail.gmail.com>
	<CAF+A=7pG4-fdiJbdhTO1NoZDK_Qnx+ri9_aNH1=ExV_GK2oxng@mail.gmail.com>
Date: Mon, 4 Nov 2013 11:14:12 +0200
Message-ID: 
 <CAN-FP2qSSL52bFFidvKVUtYUA4CB7aaUpy4XbQZxhqn8cx=gEQ@mail.gmail.com>
Subject: Re: Bad Request: No indexed columns present in by-columns clause with
 Equal operator?
From: =?ISO-8859-1?Q?Hannu_Kr=F6ger?= <hkroger@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a1132f26adb0eb304ea5656c9

--001a1132f26adb0eb304ea5656c9
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I tested the same and it seems to be so that you cannot such queries with
indexed columns. Probably you need to have at least one condition with
equal sign in the where clause. I am not sure.

You can achieve your goal by defining the primary key as follows:

create table test (
    employee_id text,
    employee_name text,
    value text,
    last_modified_date timeuuid,
    primary key (employee_id, last_modified_date)
   );

and then querying like this:
select * from test where last_modified_date > mintimeuuid('2013-11-03
13:33:30') and last_modified_date < maxtimeuuid('2013-11-05 13:33:45')
ALLOW FILTERING;

However, that will be slow because it has to do scanning. Therefore you
need to say "ALLOW FILTERING". Without that you will get a warning:
"Bad Request: Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute this
query despite the performance unpredictability, use ALLOW FILTERING"

The performance by using Cassandra like this is probably far from optimal.

Hannu


2013/11/3 Techy Teck <comptechgeeky@gmail.com>

> Thanks Hannu. I got your point.. But in my example `employee_id` won't be
> larger than `32767`.. So I am thinking of creating an index on these two
> columns -
>
>     create index employee_name_idx on test (employee_name);
>     create index last_modified_date_idx on test (last_modified_date);
>
> As the chances of executing the queries on above is very minimal.. Very
> rarely, we will be executing the above query but if we do, I wanted syste=
m
> to be capable of doing it.
>
> Now I can execute the below queries after creating an index -
>
>     select * from test where employee_name =3D 'e27';
>
>     select employee_id from test where employee_name =3D 'e27';
>     select * from test where employee_id =3D '1';
>
> But I cannot execute the below query which is - "Give me everything that
> has changed within 15 minutes" . So I wrote the below query like this -
>
>     select * from test where last_modified_date > mintimeuuid('2013-11-03
> 13:33:30') and last_modified_date < maxtimeuuid('2013-11-03 13:33:45');
>
> But it doesn't run and I always get error as  -
>
>     Bad Request: No indexed columns present in by-columns clause with
> Equal operator
>
>
> Any thoughts what wrong I am doing here?
>
>
>
> On Sun, Nov 3, 2013 at 12:43 PM, Hannu Kr=F6ger <hkroger@gmail.com> wrote=
:
>
>> Hi,
>>
>> You cannot query using a field that is not indexed in CQL. You have to
>> create either secondary index or create index tables and manage those
>> indexes by yourself and query using those. Since those keys are of high
>> cardinality, usually the recommendation for this kind of use cases is th=
at
>> you create several tables with all the data.
>>
>> 1) A table with employee_id as the primary key.
>> 2) A table with last_modified_at as the primary key (use case 2)
>> 3) A table with employee_name as the primary key (your test query with
>> employee_name 'e27' and use cases 1 & 3.)
>>
>> Then you populate all those tables with your data and then you use those
>> tables depending on the query.
>>
>> Cheers,
>> Hannu
>>
>>
>>
>> 2013/11/3 Techy Teck <comptechgeeky@gmail.com>
>>
>>> I have below table in CQL-
>>>
>>> create table test (
>>>     employee_id text,
>>>     employee_name text,
>>>     value text,
>>>     last_modified_date timeuuid,
>>>     primary key (employee_id)
>>>    );
>>>
>>>
>>> I inserted couple of records in the above table like this which I will
>>> be inserting in our actual use case scenario as well-
>>>
>>>     insert into test (employee_id, employee_name, value,
>>> last_modified_date) values ('1', 'e27',  'some_value', now());
>>>     insert into test (employee_id, employee_name, value,
>>> last_modified_date) values ('2', 'e27',  'some_new_value', now());
>>>     insert into test (employee_id, employee_name, value,
>>> last_modified_date) values ('3', 'e27',  'some_again_value', now());
>>>     insert into test (employee_id, employee_name, value,
>>> last_modified_date) values ('4', 'e28',  'some_values', now());
>>>     insert into test (employee_id, employee_name, value,
>>> last_modified_date) values ('5', 'e28',  'some_new_values', now());
>>>
>>>
>>>
>>> Now I was doing select query for -  give me all the employee_id for
>>> employee_name `e27`.
>>>
>>>     select employee_id from test where employee_name =3D 'e27';
>>>
>>> And this is the error I am getting -
>>>
>>>     Bad Request: No indexed columns present in by-columns clause with
>>> Equal operator
>>>     Perhaps you meant to use CQL 2? Try using the -2 option when
>>> starting cqlsh.
>>>
>>>
>>> Is there anything wrong I am doing here?
>>>
>>> My use cases are in general -
>>>
>>>  1. Give me everything for any of the employee_name?
>>>  2. Give me everything for what has changed in last 5 minutes?
>>>  3. Give me the latest employee_id for any of the employee_name?
>>>
>>> I am running Cassandra 1.2.11
>>>
>>>
>>
>

--001a1132f26adb0eb304ea5656c9
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I tested the same and it seems to be so that you cannot su=
ch queries with indexed columns. Probably you need to have at least one con=
dition with equal sign in the where clause. I am not sure.<div><br></div><d=
iv>
You can achieve your goal by defining the primary key as follows:=A0</div>
<div><br></div><div><span style=3D"font-family:arial,sans-serif;font-size:1=
3px">create table test (</span><br style=3D"font-family:arial,sans-serif;fo=
nt-size:13px"><span style=3D"font-family:arial,sans-serif;font-size:13px">=
=A0=A0=A0 employee_id text,</span><br style=3D"font-family:arial,sans-serif=
;font-size:13px">

<span style=3D"font-family:arial,sans-serif;font-size:13px">=A0=A0=A0 emplo=
yee_name text,</span><br style=3D"font-family:arial,sans-serif;font-size:13=
px"><span style=3D"font-family:arial,sans-serif;font-size:13px">=A0=A0=A0 v=
alue text,</span><br style=3D"font-family:arial,sans-serif;font-size:13px">

<span style=3D"font-family:arial,sans-serif;font-size:13px">=A0=A0=A0 last_=
modified_date timeuuid,</span><br style=3D"font-family:arial,sans-serif;fon=
t-size:13px"><span style=3D"font-family:arial,sans-serif;font-size:13px">=
=A0=A0=A0 primary key (employee_id, last_modified_date)</span><br style=3D"=
font-family:arial,sans-serif;font-size:13px">

<span style=3D"font-family:arial,sans-serif;font-size:13px">=A0=A0 );</span=
><br></div><div><span style=3D"font-family:arial,sans-serif;font-size:13px"=
><br></span></div><div><span style=3D"font-family:arial,sans-serif;font-siz=
e:13px">and then querying like this:</span></div>
<div><font face=3D"arial, sans-serif">select * from test where last_modifie=
d_date &gt; mintimeuuid(&#39;2013-11-03 13:33:30&#39;) and last_modified_da=
te &lt; maxtimeuuid(&#39;2013-11-05 13:33:45&#39;) ALLOW FILTERING;</font><=
br>
</div><div><font face=3D"arial, sans-serif"><br></font></div><div><font fac=
e=3D"arial, sans-serif">However, that will be slow because it has to do sca=
nning. Therefore you need to say &quot;ALLOW FILTERING&quot;. Without that =
you will get a warning:</font></div>
<div><font face=3D"arial, sans-serif">&quot;Bad Request: Cannot execute thi=
s query as it might involve data filtering and thus may have unpredictable =
performance. If you want to execute this query despite the performance unpr=
edictability, use ALLOW FILTERING&quot;</font></div>
<div><br></div><div>The performance by using Cassandra like this is probabl=
y far from optimal.</div><div><br></div><div>Hannu</div><div><font face=3D"=
arial, sans-serif"><br></font></div><div><br></div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/11=
/3 Techy Teck <span dir=3D"ltr">&lt;<a href=3D"mailto:comptechgeeky@gmail.c=
om" target=3D"_blank">comptechgeeky@gmail.com</a>&gt;</span><br><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">
<div dir=3D"ltr"><div>Thanks Hannu. I got your point.. But in my example `e=
mployee_id` won&#39;t be larger than `32767`.. So I am thinking of creating=
 an index on these two columns - <br><br>=A0=A0=A0 create index employee_na=
me_idx on test (employee_name);<br>


=A0=A0=A0 create index last_modified_date_idx on test (last_modified_date);=
<br><br>As the chances of executing the queries on above is very minimal.. =
Very rarely, we will be executing the above query but if we do, I wanted sy=
stem to be capable of doing it.<br>


<br>Now I can execute the below queries after creating an index - <br><br>=
=A0=A0=A0 select * from test where employee_name =3D &#39;e27&#39;;<div cla=
ss=3D"im"><br>=A0=A0=A0 select employee_id from test where employee_name =
=3D &#39;e27&#39;;<br>
</div>=A0=A0=A0 select * from test where employee_id =3D &#39;1&#39;;<br>

=A0=A0=A0 <br>But I cannot execute the below query which is - &quot;Give me=
 everything that has changed within 15 minutes&quot; . So I wrote the below=
 query like this - <br><br>=A0=A0=A0 select * from test where last_modified=
_date &gt; mintimeuuid(&#39;2013-11-03 13:33:30&#39;) and last_modified_dat=
e &lt; maxtimeuuid(&#39;2013-11-03 13:33:45&#39;);<br>


<br>But it doesn&#39;t run and I always get error as=A0 - <br><div class=3D=
"im"><br>=A0=A0=A0 Bad Request: No indexed columns present in by-columns cl=
ause with Equal operator<br><br><br></div></div>Any thoughts what wrong I a=
m doing here?<div>
<div class=3D"h5"><br><div><div class=3D"gmail_extra">

<br><br><div class=3D"gmail_quote">On Sun, Nov 3, 2013 at 12:43 PM, Hannu K=
r=F6ger <span dir=3D"ltr">&lt;<a href=3D"mailto:hkroger@gmail.com" target=
=3D"_blank">hkroger@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">Hi,<div>=
<br></div><div>You cannot query using a field that is not indexed in CQL. Y=
ou have to create either secondary index or create index tables and manage =
those indexes by yourself and query using those. Since those keys are of hi=
gh cardinality, usually the recommendation for this kind of use cases is th=
at you create several tables with all the data.</div>


<div><br></div><div>1) A table with employee_id as the primary key.</div><d=
iv>2) A table with last_modified_at as the primary key (use case 2)</div><d=
iv>3) A table with employee_name as the primary key (your test query with e=
mployee_name &#39;e27&#39; and use cases 1 &amp; 3.)</div>


<div><br></div><div>Then you populate all those tables with your data and t=
hen you use those tables depending on the query.</div><div><br></div><div>C=
heers,</div><div>Hannu</div><div>=A0</div></div><div><div>
<div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">2013/11/3 Techy Teck <span dir=3D"ltr">&=
lt;<a href=3D"mailto:comptechgeeky@gmail.com" target=3D"_blank">comptechgee=
ky@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex">


<div dir=3D"ltr">I have below table in CQL-<br><br>create table test (<br>=
=A0=A0=A0 employee_id text,<br>=A0=A0=A0 employee_name text,<br>=A0=A0=A0 v=
alue text,<br>=A0=A0=A0 last_modified_date timeuuid,<br>=A0=A0=A0 primary k=
ey (employee_id)<br>=A0=A0 );<br>


=A0=A0 <br>

=A0=A0 <br>I inserted couple of records in the above table like this which =
I will be inserting in our actual use case scenario as well- <br><br>=A0=A0=
=A0 insert into test (employee_id, employee_name, value, last_modified_date=
) values (&#39;1&#39;, &#39;e27&#39;,=A0 &#39;some_value&#39;, now());<br>


=A0=A0=A0 insert into test (employee_id, employee_name, value, last_modifie=
d_date) values (&#39;2&#39;, &#39;e27&#39;,=A0 &#39;some_new_value&#39;, no=
w());<br>=A0=A0=A0 insert into test (employee_id, employee_name, value, las=
t_modified_date) values (&#39;3&#39;, &#39;e27&#39;,=A0 &#39;some_again_val=
ue&#39;, now());<br>


=A0=A0=A0 insert into test (employee_id, employee_name, value, last_modifie=
d_date) values (&#39;4&#39;, &#39;e28&#39;,=A0 &#39;some_values&#39;, now()=
);<br>=A0=A0=A0 insert into test (employee_id, employee_name, value, last_m=
odified_date) values (&#39;5&#39;, &#39;e28&#39;,=A0 &#39;some_new_values&#=
39;, now());<br>


<br>=A0=A0=A0 <br>=A0=A0=A0 <br>Now I was doing select query for -=A0 give =
me all the employee_id for employee_name `e27`.<br><br>=A0=A0=A0 select emp=
loyee_id from test where employee_name =3D &#39;e27&#39;;<br>=A0=A0=A0 <br>=
And this is the error I am getting - <br>


<br>=A0=A0=A0 Bad Request: No indexed columns present in by-columns clause =
with Equal operator<br>=A0=A0=A0 Perhaps you meant to use CQL 2? Try using =
the -2 option when starting cqlsh.<br><br>=A0=A0 <br>Is there anything wron=
g I am doing here?<br>


<br>My use cases are in general - <br><br>=A01. Give me everything for any =
of the employee_name? <br>=A02. Give me everything for what has changed in =
last 5 minutes? <br>=A03. Give me the latest employee_id for any of the emp=
loyee_name?<br>


<br>I am running Cassandra 1.2.11<br><br></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div></div></div>
</blockquote></div><br></div>

--001a1132f26adb0eb304ea5656c9--