Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80DEC17E90 for ; Wed, 18 Mar 2015 09:46:20 +0000 (UTC) Received: (qmail 21039 invoked by uid 500); 18 Mar 2015 09:46:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 20996 invoked by uid 500); 18 Mar 2015 09:46:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 20986 invoked by uid 99); 18 Mar 2015 09:46:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 09:46:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of adelapena@stratio.com designates 209.85.213.178 as permitted sender) Received: from [209.85.213.178] (HELO mail-ig0-f178.google.com) (209.85.213.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2015 09:45:52 +0000 Received: by igcau2 with SMTP id au2so57868135igc.0 for ; Wed, 18 Mar 2015 02:45:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=cDJVbppCTkiVf6ZRz+/Z2079FrZWy64c2bwhunkJ7BU=; b=knbuOqKO8cakJRqGslLXEpgo0jv6Tm66NkNu/+vH6eYZxp++bK8g/lfx702KrypBDT b5A5vk2ll/qDM+ZiO+ia9PRsOhsrh6BzBhHjJe8lpTkaqEL364d/e3bDTlK700TLLNx9 y2vDARq/qgUu+MfI43Q0d1zdqUTfSnPtXU2XpPO2S0WFoSqhpRkkspB7EwvE4sEKrEUl XNR7SUjBx76tYIWEiSkiOWw8P2Z4oZqWWyLcx8xJO5kV0/RSSKbKneZkfKGQi3XauLsF y2ijaWS+nq0xqTecBnLKdyRsY7Jf5HqmDuI61OwTaw56xM9DWyv1KxXyVB6k//yKTiuH dndQ== X-Gm-Message-State: ALoCoQnhFvCcy7M4uXdIg9SGOnCeTrri+kRFVK20263igaDwGVA5AyZpY9Qp9oemZa8Qq6Au3K0t MIME-Version: 1.0 X-Received: by 10.42.82.199 with SMTP id e7mr86840093icl.18.1426671950033; Wed, 18 Mar 2015 02:45:50 -0700 (PDT) Received: by 10.64.223.229 with HTTP; Wed, 18 Mar 2015 02:45:49 -0700 (PDT) In-Reply-To: References: <4F632A51-55EB-454D-B0F1-532EC3ACE702@beecloud.cn> Date: Wed, 18 Mar 2015 10:45:49 +0100 Message-ID: Subject: Re: 2d or multi dimension range query in cassandra CQL From: =?UTF-8?Q?Andres_de_la_Pe=C3=B1a?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf30334657cb127805118cf234 X-Virus-Checked: Checked by ClamAV on apache.org --20cf30334657cb127805118cf234 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, With Stratio Cassandra you can create Lucene based indexes for multidimensional queries this way: ALTER TABLE images.results1 ADD lucene text ; CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene) USING 'com.stratio.cassandra.index.RowIndex' WITH OPTIONS =3D { 'refresh_seconds':'1', 'schema':'{ fields:{ image_caseid:{type:"string"}, x:{type:"double"}, y:{type:"double"} } } '}; Then you can perform the query using the dummy column: SELECT * FROM images.results1 WHERE lucene=3D'{ filter:{type:"boolean", mus= t:[ {field:"image_caseid", type:"match", value:"mehak"}, {field:"x", type:"range", lower:100}, {field:"y", type:"range", lower:100} ]}}'; However, you can take advantage of partition key to route the query only to the nodes owning the data: SELECT * FROM images.results1 WHERE image_caseid=3D'mehak' AND lucene=3D'{ filter:{type:"boolean", must:[ {field:"x", type:"range", lower:100}, {field:"y", type:"range", lower:100} ]}}'; Or, even better: SELECT * FROM images.results1 WHERE image_caseid=3D'mehak' AND x>100 AND lucene=3D'{ filter:{field:"y", type:"range", lower:100}}'; Additionally, if your data are geospatial (latitude and longitude), soon you will can use the incoming spatial features. 2015-03-17 23:01 GMT+01:00 Mehak Mehta : > Sorry I gave you wrong table definition for query. Here a composite key > of image_caseid, x and uuid which is unique. I have used x in clustering > columns to query it. And used secondary index on y column. > > 1. Example > *cqlsh:images> CREATE TABLE images.results1 (uuid uuid, > analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y > double, submit_date timestamp, points list, PRIMARY KEY > ((image_caseid),x,uuid));* > *cqlsh:images> create index results1_y on results1(y);* > > In the below query you can see I have image_caseid as primary key which i= s > filtered. Even then it is giving error that "*No indexed columns present* > " > > *cqlsh:images> select * from results1 where image_caseid=3D'mehak' and x = > > 100 and y<100 order by image_caseid asc;* > *code=3D2200 [Invalid query] message=3D"No indexed columns present in > by-columns clause with Equal operator"* > > 2. Example > I also tried including both x and y columns as composite key even then > query gives following error: > > *cqlsh:images> CREATE TABLE images.results1 (uuid uuid, > analysis_execution_id varchar, analysis_execution_uuid uuid, x double, y > double, submit_date timestamp, points list, PRIMARY KEY > ((image_caseid),x,y,uuid));* > > *cqlsh:images> select * from results1 where image_caseid=3D'mehak' and x = > > 100 and y<100 order by image_caseid asc;* > *code=3D2200 [Invalid query] message=3D"PRIMARY KEY column "y" cannot be > restricted (preceding column "ColumnDefinition{name=3Dx, > type=3Dorg.apache.cassandra.db.marshal.DoubleType, kind=3DCLUSTERING_COLU= MN, > componentIndex=3D0, indexName=3Dnull, indexType=3Dnull}" is either not re= stricted > or by a non-EQ relation)"* > > Thanks, > Mehak > > > On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky > wrote: > >> Yeah, you may have to add a dummy column populated with a constant, or >> just pick some other additional column to index that you can apply =3D t= o. >> It's just a requirement to assure that a filtered select isn't too-too s= low. >> >> Uh.... if you are applying =3D to your primary key then there is no need >> for any filtering. What did you really mean to do? It makes no sense the >> way you have it! >> >> Either go with DSE Search/Solr, or google "Tuplejump Stargate" or >> "Stratio". >> >> -- Jack Krupansky >> >> On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta >> wrote: >> >>> Hi, >>> >>> In my case I am just trying to do range queries on 2 dimensions i.e. x >>> and y. >>> >>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid, >>> analysis_execution_id varchar, analysis_execution_uuid uuid, x double,= y >>> double, submit_date timestamp, points list, PRIMARY KEY >>> (image_caseid));* >>> >>> With secondary indexes on x and y: >>> *cqlsh:images> create index results1_y on results1(y);* >>> *cqlsh:images> create index results1_x on results1(x);* >>> >>> But when I try the following the query it is not allowed with an error. >>> >>> *cqlsh:images> select * from results1 where image_caseid=3D'mehak' and = x > >>> 100 and y <100 allow filtering;* >>> *code=3D2200 [Invalid query] message=3D"No indexed columns present in >>> by-columns clause with Equal operator"* >>> >>> Can you tell me why this is not working. I can't understand why >>> Cassandra is not allowing me to use Non EQ range query on secondary ind= exes >>> even when I have restricted my primary key by EQ as specified in error >>> message. >>> >>> Also please elaborate on how to use Lucene/Solr in this case with >>> Cassandra. >>> >>> Thanks, >>> Mehak >>> >>> On Tue, Mar 17, 2015 at 2:26 PM, =E5=81=A5=E5=B3=B0 =E9=AB=98 wrote: >>> >>>> Lucene or Solr is better. I=E2=80=99m using lucene >>>> >>>> =E5=9C=A8 2015=E5=B9=B43=E6=9C=8818=E6=97=A5=EF=BC=8C=E4=B8=8A=E5=8D= =882:11=EF=BC=8CJack Krupansky =E5=86=99=E9=81= =93=EF=BC=9A >>>> >>>> 1. Create multiple secondary indexes, one for each non-key column you >>>> need to index on. Not recommended. Considered an anti-pattern for Cass= andra. >>>> 2. Use DSE Search/Solr. >>>> 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio. >>>> >>>> -- Jack Krupansky >>>> >>>> On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta >>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I want to perform range queries (as in x and y ranges) on a large dat= a >>>>> billions of rows. >>>>> CQL allows me to put Non EQ restrictions on only one of the clusterin= g >>>>> columns. >>>>> Its not allowing me to filter the data using any other column even >>>>> with use of Allow Filtering option. >>>>> >>>>> cqlsh:images> *select * from results1 where image_caseid=3D'mehak' an= d >>>>> x > 100 and y <100 order by x allow filtering;* >>>>> *code=3D2200 [Invalid query] message=3D"No indexed columns present in >>>>> by-columns clause with Equal operator"* >>>>> >>>>> My table definition is : >>>>> >>>>> *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id >>>>> varchar, analysis_execution_uuid uuid, x double, y double, loc varch= ar, w >>>>> double, h double, normalized varchar, type varchar, filehost varchar, >>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid va= rchar, >>>>> image_mpp_x double, image_mpp_y double, image_width double, image_hei= ght >>>>> double, objective double, cancer_type varchar, Area float, submit_da= te >>>>> timestamp, points list, PRIMARY KEY ((image_caseid),x));* >>>>> >>>>> Kindly suggest me how to model is data and perform 2d or in general >>>>> multi dimension range queries in Cassandra efficiently. >>>>> >>>>> >>>>> Thanks, >>>>> Mehak >>>>> >>>> >>>> >>>> >>> >> > --=20 Andr=C3=A9s de la Pe=C3=B1a Avenida de Europa, 26. =C3=81tica 5. 3=C2=AA Planta 28224 Pozuelo de Alarc=C3=B3n, Madrid Tel: +34 91 352 59 42 // *@stratiobd * --20cf30334657cb127805118cf234 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

With Stratio Cassandra you can crea= te Lucene based indexes for multidimensional queries this way:
ALTER TABLE images.re= sults1 ADD lucene text ;

CREATE CUSTOM = INDEX lucene_idx ON images.results1 (lucene)=C2=A0
USING 'com.stratio.cassandra.index.RowInde= x'=C2=A0
WITH OPTI= ONS =3D {
=C2=A0'r= efresh_seconds':'1',=C2=A0
=C2=A0'schema':'{
=C2=A0= fields:{
=C2= =A0 image_caseid:{type:&= quot;string"},
x:{type:"double"= ;},=C2=A0
y:{type:"double"} } } &#= 39;};

Then you can perform the query = using the dummy column:

SELECT * FROM images.results1 WHERE lucene=3D'{ filter:{= type:"boolean", must:[
{field:"= image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower= :100},
{field:"y", type:"range&= quot;, lower:100}
]}}&= #39;;

However, you can take advantage= of partition key to route the query only to the nodes owning the data:

SELECT * FRO= M images.results1 WHERE image_caseid=3D'mehak' AND lucene=3D'{ = filter:{type:"boolean", must:[
{fiel= d:"x", type:"range", lower:100},
= {field:"y", type:"range", lower:100}
]}}';
=
Or, even better:

SELECT * FROM images.results1 WHERE image_caseid=3D'm= ehak' AND x>100 AND lucene=3D'{ filter:{field:"y", typ= e:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitu= de), soon you will can use the incoming spatial features.
=


2015-03-17 23:01 GMT+01:00 Mehak Mehta <= memehta@cs.s= tonybrook.edu>:
Sorry I gave you wrong table definition for query. Here a composite k= ey of=C2=A0image_caseid, x and uuid which is unique. I have used x in clust= ering columns to query it. And used secondary index on y column.=C2=A0
=
1. Example
cqlsh:images> CREATE TABLE i= mages.results1 (uuid uuid, analysis_execution_id varchar, analysis_executio= n_uuid uuid, x =C2=A0double, y double, submit_date timestamp, points list&l= t;double>, =C2=A0PRIMARY KEY ((image_caseid),x,uuid));
cqlsh:images> create index results1_y on results1(y);

In the below query you can see I have=C2=A0image_ca= seid as primary key which is filtered. Even then it is giving error that &q= uot;No indexed columns present"=C2=A0

=
cqlsh:images> select * from results1 where image_caseid=3D'm= ehak' and x > 100 and y<100 order by image_caseid asc;
<= span class=3D"">
code=3D2200 [Invalid query]= message=3D"No indexed columns present in by-columns clause with Equal= operator"

2. E= xample
I also tried including both x and y columns as composite k= ey even then query gives following error:

<= b>cqlsh:images> CREATE TABLE images.results1 (uuid uuid, analysis_execut= ion_id varchar, analysis_execution_uuid uuid, x =C2=A0double, y double, sub= mit_date timestamp, points list<double>, =C2=A0PRIMARY KEY ((image_ca= seid),x,y,uuid));

cqlsh:i= mages> select * from results1 where image_caseid=3D'mehak' and x= > 100 and y<100 order by image_caseid asc;
code=3D2= 200 [Invalid query] message=3D"PRIMARY KEY column "y" cannot= be restricted (preceding column "ColumnDefini= tion{name=3Dx, type=3Dorg.apache.cassandra.db.marshal.DoubleType, kind=3DCL= USTERING_COLUMN, componentIndex=3D0, indexName=3Dnull, indexType=3Dnull}&qu= ot; is either not restricted or by a non-EQ relation)"

Thanks,
Mehak=C2=A0


On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupa= nsky <jack.krupansky@gmail.com> wrote:
Yeah, you may have to add a dummy colu= mn populated with a constant, or just pick some other additional column to = index that you can apply =3D to. It's just a requirement to assure that= a filtered select isn't too-too slow.

Uh.... if you= are applying =3D to your primary key then there is no need for any filteri= ng. What did you really mean to do? It makes no sense the way you have it!<= /div>

Either go with DSE Search/Solr, or google "Tu= plejump Stargate" or "Stratio".

-- Jack Krupansky

On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta= <memehta@cs.stonybrook.edu> wrote:
Hi,

In my case I am = just trying to do range queries on 2 dimensions i.e. x and y.

cqlsh:images> CREATE TABLE images.results1 (uui= d uuid, analysis_execution_id varchar, analysis_execution_uuid uuid, x =C2= =A0double, y double, submit_date timestamp, points list<double>, =C2= =A0PRIMARY KEY (image_caseid));

With sec= ondary indexes on x and y:
cqlsh:images> create index resul= ts1_y on results1(y);
cqlsh:images> create index result= s1_x on results1(x);
=C2=A0
But when I try t= he following the query it is not allowed with an error.=C2=A0
cqlsh:images> select * from results1 where image_cas= eid=3D'mehak' and x > 100 and y <100 allow filtering;
code=3D2200 [Invalid query] message=3D"No indexed colu= mns present in by-columns clause with Equal operator"
=

Can you tell me why this is not working. I = can't understand why Cassandra is not allowing me to use Non EQ range q= uery on secondary indexes even when I have restricted my primary key by EQ = as specified in error message.

Also please elabora= te on how to use Lucene/Solr in this case with Cassandra.

Thanks,
Mehak

On Tue, Mar 17, 2015 at 2:26 PM, =E5=81=A5=E5=B3=B0 = =E9=AB=98 <g@beecloud.cn> wrote:
Lucene or Solr is better. I=E2=80=99m using lucen= e
=E5=9C=A8 2015=E5=B9=B43=E6=9C=881= 8=E6=97=A5=EF=BC=8C=E4=B8=8A=E5=8D=882:11=EF=BC=8CJack Krupansky <jack.krupansky@gmai= l.com> =E5=86=99=E9=81=93=EF=BC=9A

1. Create multiple secondary indexes, one for each non-key column = you need to index on. Not recommended. Considered an anti-pattern for Cassa= ndra.
2. Use DSE Search/Solr.
3. Use Lucene-based indexing wi= th TumpleJump/Stargate or Stratio.
-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta= <memehta@cs.stonybrook.edu> wrote:
Hi,

I=C2=A0want to = perform range queries (as in x and y ranges) on a large data billions of ro= ws.=C2=A0
CQL allows me to put Non EQ re= strictions on only one of the clustering columns.=C2=A0
Its not allowing me to filter the data using any other colu= mn even with use of Allow Filtering option.

cqlsh:images>=C2=A0s= elect * from results1 where image_caseid=3D'mehak' and x > 100 a= nd y <100 order by x allow filtering;
code=3D2200 [Inva= lid query] message=3D"No indexed columns present in by-columns clause = with Equal operator"

=
My table definition is :

CREATE TABL= E images.results1 (uuid uuid, analysis_execution_id varchar, analysis_execu= tion_uuid uuid, x =C2=A0double, y double, loc varchar, w double, h double, = normalized varchar, type varchar, filehost varchar, filename varchar, image= _uuid uuid, image_uri varchar, image_caseid varchar, image_mpp_x double, im= age_mpp_y double, image_width double, image_height double, objective double= , cancer_type varchar, =C2=A0Area float, submit_date timestamp, points list= <double>, =C2=A0PRIMARY KEY ((image_caseid),x));

Kindly suggest = me how to model is data and perform 2d or in general multi dimension range = queries in Cassandra efficiently.


T= hanks,
Mehak








--
=

<= div dir=3D"ltr">
Andr=C3=A9s de la Pe=C3=B1a


Avenida de Europa, 26. =C3=81tica 5. 3=C2=AA Planta=
28224 Pozuelo de Alarc=C3=B3n, Madrid
Tel: +34 = 91 352 59 42 //=C2=A0@stratiobd
--20cf30334657cb127805118cf234--