Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of jaluce06@gmail.com designates
 209.85.210.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <B2B23EC0-FD15-4C47-8191-8794B4D00C90@thelastpickle.com>
References: 
 <CALnckSpwJa4j6JrWY6+CJJ0K8M3vne0w3MGRfyt_Zow-HDUJ_w@mail.gmail.com>
	<B2B23EC0-FD15-4C47-8191-8794B4D00C90@thelastpickle.com>
Date: Tue, 21 Aug 2012 11:14:52 +0200
Message-ID: 
 <CALnckSqLj=muifH4uu8--dqe7vcEt4ZxB2mLAm6V2zvXyrrHeA@mail.gmail.com>
Subject: Re: Secondary index and/or row key in the read path ?
From: Jean-Armel Luce <jaluce06@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8f2343b51e093d04c7c30f31

--e89a8f2343b51e093d04c7c30f31
Content-Type: text/plain; charset=ISO-8859-1

Hi Aaron,

Thank you for your answer.

So, I shall do post-processing for selecting a row using a row key *and*
applying a column level filter.

Best Regards,
Jean-Armel

2012/8/21 aaron morton <aaron@thelastpickle.com>

> - do we need to post-process (filter) the result of the query in our
> application ?
>
> Thats the one :)
>
> Right now the code paths don't exist to select a row using a row key *and*
> apply a column level filter. The RPC API does not work that way and I'm not
> sure if this is something that is planned for CQL.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/08/2012, at 6:33 PM, Jean-Armel Luce <jaluce06@gmail.com> wrote:
>
>
> Hello,
>
> I am using Cassandra 1.1.1 and CQL3.
>
> Could you tell me what is the best strategy for retrieving a row using a
> condition on a row key (operator =) and also filter on a 2nd column?
>
> For example, I create a  table named "testwhere" with a row key on column
> "mykey" and 2 other columns "col1" and "col2".
>
> I would like to retrieve the row with the key 'key1' only if col1 =
> 'abcd'
> I send the request  SELECT mykey, col1 from testwhere where mykey = 'key1'
> and col1 = 'abcd';
> As you can see, the 1st condition in the WHERE clause is based on the row
> key.
> However the request doesn't work if no secondary index is created on the
> column used in the 2nd condition of the WHERE clause. It works only if a
> secondary indexed is created on this 2nd column (see below).
> Does that mean that the secondary index is used in the read path instead
> of the row key, even if there is a condition on the row key in the WHERE
> clause ?
>
> Here is an example :
>
> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
> Connected to Test Cluster at localhost:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> use test1;
> cqlsh:test1> CREATE TABLE testwhere (mykey varchar PRIMARY KEY,
>          ...  col1 varchar,
>          ...  col2 varchar);
> cqlsh:test1> INSERT INTO testwhere (mykey, col1, col2) VALUES ('key1',
> 'abcd', 'efgh');
>
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1';
>  mykey | col1
> -------+------
>   key1 | abcd
>
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
> col1 = 'abcd';
> Bad Request: No indexed columns present in by-columns clause with Equal
> operator
>
> cqlsh:test1> CREATE INDEX col1_idx ON testwhere (col1);
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
> col1 = 'abcd';
>  mykey | col1
> -------+------
>   key1 | abcd
>
> cqlsh:test1>
>
>
> My understanding is :
> The 1st SELECT is working because there is only the row key in the WHERE
> clause
> The 2nd SELECT is not working because the row key is in the WHERE clause,
> but there is no index on col1
> The 3rd SELECT (which is the same as the 2nd SELECT) is working because
> the row key is in the WHERE clause, and a secondary index is created on col1
>
>
> For this use case, what are the recommendations of the Cassandra community
> ?
> - do we need to create a secondary index for each column we want to filter
> ?
> - do we need to post-process (filter) the result of the query in our
> application ?
> - or is there another solution ?
>
>
> Thanks.
>
> Jean-Armel
>
>
>

--e89a8f2343b51e093d04c7c30f31
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Aaron,<br><br>Thank you for your answer.<br><br>So, I shall do post-proc=
essing for selecting a row using a row key *and* applying a column level fi=
lter.<br><br>Best Regards,<br>Jean-Armel<br><br><div class=3D"gmail_quote">
2012/8/21 aaron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelas=
tpickle.com" target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span><br><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word"><div class=3D"im"><blockquote type=3D"c=
ite">- do we need to post-process (filter) the result of the query in our a=
pplication ?<br></blockquote></div>Thats the one :)<div><br></div><div>Righ=
t now the code paths don&#39;t exist to select a row using a row key *and* =
apply a column level filter. The RPC API does not work that way and I&#39;m=
 not sure if this is something that is planned for CQL.=A0</div>
<div><br></div><div>Cheers</div><div><br><div>
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><span style=3D"text-indent=
:0px;letter-spacing:normal;font-variant:normal;font-style:normal;font-weigh=
t:normal;line-height:normal;border-collapse:separate;text-transform:none;fo=
nt-size:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><=
div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance Deve=
loper</div><div>@aaronmorton</div><div><a href=3D"http://www.thelastpickle.=
com" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div></=
span></div>
</span></div></span></span>
</div><div><div class=3D"h5">

<br><div><div>On 20/08/2012, at 6:33 PM, Jean-Armel Luce &lt;<a href=3D"mai=
lto:jaluce06@gmail.com" target=3D"_blank">jaluce06@gmail.com</a>&gt; wrote:=
</div><br><blockquote type=3D"cite"><br>Hello,<br><br>I am using Cassandra =
1.1.1 and CQL3.<br>
<br>Could you tell me what is the best strategy for retrieving a row using =
a condition on a row key (operator =3D) and also filter on a 2nd column?<br=
><br>For example, I create a=A0 table named &quot;testwhere&quot; with a ro=
w key on column &quot;mykey&quot; and 2 other columns &quot;col1&quot; and =
&quot;col2&quot;.<br>

=A0 <br>I would like to retrieve the row with the key &#39;key1&#39; only i=
f col1 =3D &#39;abcd&#39;=A0 <br>I send the request=A0 SELECT mykey, col1 f=
rom testwhere where mykey =3D &#39;key1&#39; and col1 =3D &#39;abcd&#39;;<b=
r>As you can see, the 1st condition in the WHERE clause is based on the row=
 key.<br>

However the request doesn&#39;t work if no secondary index is created on th=
e column used in the 2nd condition of the WHERE clause. It works only if a =
secondary indexed is created on this 2nd column (see below). <br>Does that =
mean that the secondary index is used in the read path instead of the row k=
ey, even if there is a condition on the row key in the WHERE clause ?<br>

<br>Here is an example :<br><br>jal@jal-VirtualBox:~/cassandra/apache-cassa=
ndra-1.1.1/bin$ ./cqlsh -3<br>Connected to Test Cluster at localhost:9160.<=
br>[cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.=
0]<br>

Use HELP for help.<br>cqlsh&gt; use test1;<br>cqlsh:test1&gt; CREATE TABLE =
testwhere (mykey varchar PRIMARY KEY,<br>=A0=A0=A0=A0=A0=A0=A0=A0 ...=A0 co=
l1 varchar,<br>=A0=A0=A0=A0=A0=A0=A0=A0 ...=A0 col2 varchar);<br>cqlsh:test=
1&gt; INSERT INTO testwhere (mykey, col1, col2) VALUES (&#39;key1&#39;, =
9;abcd&#39;, &#39;efgh&#39;);<br>

<br>cqlsh:test1&gt;=A0 SELECT mykey, col1 from testwhere where mykey =3D &#=
39;key1&#39;;<br>=A0mykey | col1<br>-------+------<br>=A0 key1 | abcd<br><b=
r>cqlsh:test1&gt;=A0 SELECT mykey, col1 from testwhere where mykey =3D &#39=
;key1&#39; and col1 =3D &#39;abcd&#39;;<br>

Bad Request: No indexed columns present in by-columns clause with Equal ope=
rator<br><br>cqlsh:test1&gt; CREATE INDEX col1_idx ON testwhere (col1);<br>=
cqlsh:test1&gt;=A0 SELECT mykey, col1 from testwhere where mykey =3D &#39;k=
ey1&#39; and col1 =3D &#39;abcd&#39;;<br>

=A0mykey | col1<br>-------+------<br>=A0 key1 | abcd<br><br>cqlsh:test1&gt;=
<br><br><br>My understanding is :<br>The 1st SELECT is working because ther=
e is only the row key in the WHERE clause<br>The 2nd SELECT is not working =
because the row key is in the WHERE clause, but there is no index on col1<b=
r>

The 3rd SELECT (which is the same as the 2nd SELECT) is working because the=
 row key is in the WHERE clause, and a secondary index is created on col1<b=
r><br><br>For this use case, what are the recommendations of the Cassandra =
community ?<br>

- do we need to create a secondary index for each column we want to filter =
?<br>- do we need to post-process (filter) the result of the query in our a=
pplication ?<br>- or is there another solution ?<br><br><br>Thanks.<br>

<br>Jean-Armel<br>
</blockquote></div><br></div></div></div></div></blockquote></div><br>

--e89a8f2343b51e093d04c7c30f31--