cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andres de la Peña <adelap...@stratio.com>
Subject Re: 2d or multi dimension range query in cassandra CQL
Date Wed, 18 Mar 2015 09:45:49 GMT
Hi,

With Stratio Cassandra you can create Lucene based indexes for
multidimensional queries this way:

ALTER TABLE images.results1 ADD lucene text ;

CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
USING 'com.stratio.cassandra.index.RowIndex'
WITH OPTIONS = {
 'refresh_seconds':'1',
 'schema':'{
  fields:{
  image_caseid:{type:"string"},
x:{type:"double"},
y:{type:"double"} } } '};

Then you can perform the query using the dummy column:

SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean", must:[
{field:"image_caseid", type:"match", value:"mehak"},
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

However, you can take advantage of partition key to route the query only to
the nodes owning the data:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
filter:{type:"boolean", must:[
{field:"x", type:"range", lower:100},
{field:"y", type:"range", lower:100}
]}}';

Or, even better:

SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
lucene='{ filter:{field:"y", type:"range", lower:100}}';

Additionally, if your data are geospatial (latitude and longitude), soon
you will can use the incoming spatial features.



2015-03-17 23:01 GMT+01:00 Mehak Mehta <memehta@cs.stonybrook.edu>:

> Sorry I gave you wrong table definition for query. Here a composite key
> of image_caseid, x and uuid which is unique. I have used x in clustering
> columns to query it. And used secondary index on y column.
>
> 1. Example
> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
> double, submit_date timestamp, points list<double>,  PRIMARY KEY
> ((image_caseid),x,uuid));*
> *cqlsh:images> create index results1_y on results1(y);*
>
> In the below query you can see I have image_caseid as primary key which is
> filtered. Even then it is giving error that "*No indexed columns present*
> "
>
> *cqlsh:images> select * from results1 where image_caseid='mehak' and x >
> 100 and y<100 order by image_caseid asc;*
> *code=2200 [Invalid query] message="No indexed columns present in
> by-columns clause with Equal operator"*
>
> 2. Example
> I also tried including both x and y columns as composite key even then
> query gives following error:
>
> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
> double, submit_date timestamp, points list<double>,  PRIMARY KEY
> ((image_caseid),x,y,uuid));*
>
> *cqlsh:images> select * from results1 where image_caseid='mehak' and x >
> 100 and y<100 order by image_caseid asc;*
> *code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
> restricted (preceding column "ColumnDefinition{name=x,
> type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
> componentIndex=0, indexName=null, indexType=null}" is either not restricted
> or by a non-EQ relation)"*
>
> Thanks,
> Mehak
>
>
> On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <jack.krupansky@gmail.com>
> wrote:
>
>> Yeah, you may have to add a dummy column populated with a constant, or
>> just pick some other additional column to index that you can apply = to.
>> It's just a requirement to assure that a filtered select isn't too-too slow.
>>
>> Uh.... if you are applying = to your primary key then there is no need
>> for any filtering. What did you really mean to do? It makes no sense the
>> way you have it!
>>
>> Either go with DSE Search/Solr, or google "Tuplejump Stargate" or
>> "Stratio".
>>
>> -- Jack Krupansky
>>
>> On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <memehta@cs.stonybrook.edu>
>> wrote:
>>
>>> Hi,
>>>
>>> In my case I am just trying to do range queries on 2 dimensions i.e. x
>>> and y.
>>>
>>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>>> (image_caseid));*
>>>
>>> With secondary indexes on x and y:
>>> *cqlsh:images> create index results1_y on results1(y);*
>>> *cqlsh:images> create index results1_x on results1(x);*
>>>
>>> But when I try the following the query it is not allowed with an error.
>>>
>>> *cqlsh:images> select * from results1 where image_caseid='mehak' and x >
>>> 100 and y <100 allow filtering;*
>>> *code=2200 [Invalid query] message="No indexed columns present in
>>> by-columns clause with Equal operator"*
>>>
>>> Can you tell me why this is not working. I can't understand why
>>> Cassandra is not allowing me to use Non EQ range query on secondary indexes
>>> even when I have restricted my primary key by EQ as specified in error
>>> message.
>>>
>>> Also please elaborate on how to use Lucene/Solr in this case with
>>> Cassandra.
>>>
>>> Thanks,
>>> Mehak
>>>
>>> On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <g@beecloud.cn> wrote:
>>>
>>>> Lucene or Solr is better. I’m using lucene
>>>>
>>>> 在 2015年3月18日,上午2:11,Jack Krupansky <jack.krupansky@gmail.com>
写道:
>>>>
>>>> 1. Create multiple secondary indexes, one for each non-key column you
>>>> need to index on. Not recommended. Considered an anti-pattern for Cassandra.
>>>> 2. Use DSE Search/Solr.
>>>> 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <memehta@cs.stonybrook.edu
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to perform range queries (as in x and y ranges) on a large data
>>>>> billions of rows.
>>>>> CQL allows me to put Non EQ restrictions on only one of the clustering
>>>>> columns.
>>>>> Its not allowing me to filter the data using any other column even
>>>>> with use of Allow Filtering option.
>>>>>
>>>>> cqlsh:images> *select * from results1 where image_caseid='mehak' and
>>>>> x > 100 and y <100 order by x allow filtering;*
>>>>> *code=2200 [Invalid query] message="No indexed columns present in
>>>>> by-columns clause with Equal operator"*
>>>>>
>>>>> My table definition is :
>>>>>
>>>>> *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
>>>>> varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar,
w
>>>>> double, h double, normalized varchar, type varchar, filehost varchar,
>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid varchar,
>>>>> image_mpp_x double, image_mpp_y double, image_width double, image_height
>>>>> double, objective double, cancer_type varchar,  Area float, submit_date
>>>>> timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));*
>>>>>
>>>>> Kindly suggest me how to model is data and perform 2d or in general
>>>>> multi dimension range queries in Cassandra efficiently.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mehak
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>


-- 

Andrés de la Peña


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Mime
View raw message