cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehak Mehta <meme...@cs.stonybrook.edu>
Subject Re: 2d or multi dimension range query in cassandra CQL
Date Sun, 22 Mar 2015 06:44:09 GMT
Hi,

On the basis of some suggestions, I tried using tuplejump for multidimensional
queries. Since other mostly needed root permissions (for building ) which I
don't have on my cluster account.

I found a major problem in tuplejump (stargate-core). When I am using it
with a list type field in my table. It stops working.
For e.g.

create table person (
    id int primary key,
    isActive boolean,
    age int,
    eyeColor varchar,
    name text,
    gender varchar,
    company varchar,
    email varchar,
    phone varchar,
    address text,
    points list<double>,
    stargate text
);

with indexing as:
CREATE CUSTOM INDEX person_idx ON PERSON(stargate) USING some
'com.tuplejump.stargate.RowIndex' WITH options =
{
        'sg_options':'{
                "fields":{
                "eyeColor":{},
                        "age":{},
                        "phone":{}
                        }
        }'
};

If I insert data in the table along with points list. The following query
won't give any results (0 rows):

SELECT * FROM RESULTS1 WHERE stargate ='{
    filter: {
        type: "range",
        field: "x",
        lower: 0
    }
}';

I tried removing points list<double> from the table and it works i.e. same
query will return results.
Can somebody help me with this problem as I couldn't find much support from
Stargate.

Please note that I am using Cassandra 2.0.9 compatible with Stargate-core
as given in link (
http://stargate-core.readthedocs.org/en/latest/quickstart.html).

Thanks,
Mehak


On Wed, Mar 18, 2015 at 5:45 AM, Andres de la Peña <adelapena@stratio.com>
wrote:

> Hi,
>
> With Stratio Cassandra you can create Lucene based indexes for
> multidimensional queries this way:
>
> ALTER TABLE images.results1 ADD lucene text ;
>
> CREATE CUSTOM INDEX lucene_idx ON images.results1 (lucene)
> USING 'com.stratio.cassandra.index.RowIndex'
> WITH OPTIONS = {
>  'refresh_seconds':'1',
>  'schema':'{
>   fields:{
>   image_caseid:{type:"string"},
> x:{type:"double"},
> y:{type:"double"} } } '};
>
> Then you can perform the query using the dummy column:
>
> SELECT * FROM images.results1 WHERE lucene='{ filter:{type:"boolean",
> must:[
> {field:"image_caseid", type:"match", value:"mehak"},
> {field:"x", type:"range", lower:100},
> {field:"y", type:"range", lower:100}
> ]}}';
>
> However, you can take advantage of partition key to route the query only
> to the nodes owning the data:
>
> SELECT * FROM images.results1 WHERE image_caseid='mehak' AND lucene='{
> filter:{type:"boolean", must:[
> {field:"x", type:"range", lower:100},
> {field:"y", type:"range", lower:100}
> ]}}';
>
> Or, even better:
>
> SELECT * FROM images.results1 WHERE image_caseid='mehak' AND x>100 AND
> lucene='{ filter:{field:"y", type:"range", lower:100}}';
>
> Additionally, if your data are geospatial (latitude and longitude), soon
> you will can use the incoming spatial features.
>
>
>
> 2015-03-17 23:01 GMT+01:00 Mehak Mehta <memehta@cs.stonybrook.edu>:
>
>> Sorry I gave you wrong table definition for query. Here a composite key
>> of image_caseid, x and uuid which is unique. I have used x in clustering
>> columns to query it. And used secondary index on y column.
>>
>> 1. Example
>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>> ((image_caseid),x,uuid));*
>> *cqlsh:images> create index results1_y on results1(y);*
>>
>> In the below query you can see I have image_caseid as primary key which
>> is filtered. Even then it is giving error that "*No indexed columns
>> present*"
>>
>> *cqlsh:images> select * from results1 where image_caseid='mehak' and x >
>> 100 and y<100 order by image_caseid asc;*
>> *code=2200 [Invalid query] message="No indexed columns present in
>> by-columns clause with Equal operator"*
>>
>> 2. Example
>> I also tried including both x and y columns as composite key even then
>> query gives following error:
>>
>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>> ((image_caseid),x,y,uuid));*
>>
>> *cqlsh:images> select * from results1 where image_caseid='mehak' and x >
>> 100 and y<100 order by image_caseid asc;*
>> *code=2200 [Invalid query] message="PRIMARY KEY column "y" cannot be
>> restricted (preceding column "ColumnDefinition{name=x,
>> type=org.apache.cassandra.db.marshal.DoubleType, kind=CLUSTERING_COLUMN,
>> componentIndex=0, indexName=null, indexType=null}" is either not restricted
>> or by a non-EQ relation)"*
>>
>> Thanks,
>> Mehak
>>
>>
>> On Tue, Mar 17, 2015 at 5:19 PM, Jack Krupansky <jack.krupansky@gmail.com
>> > wrote:
>>
>>> Yeah, you may have to add a dummy column populated with a constant, or
>>> just pick some other additional column to index that you can apply = to.
>>> It's just a requirement to assure that a filtered select isn't too-too slow.
>>>
>>> Uh.... if you are applying = to your primary key then there is no need
>>> for any filtering. What did you really mean to do? It makes no sense the
>>> way you have it!
>>>
>>> Either go with DSE Search/Solr, or google "Tuplejump Stargate" or
>>> "Stratio".
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Mar 17, 2015 at 4:51 PM, Mehak Mehta <memehta@cs.stonybrook.edu>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In my case I am just trying to do range queries on 2 dimensions i.e. x
>>>> and y.
>>>>
>>>> *cqlsh:images> CREATE TABLE images.results1 (uuid uuid,
>>>> analysis_execution_id varchar, analysis_execution_uuid uuid, x  double, y
>>>> double, submit_date timestamp, points list<double>,  PRIMARY KEY
>>>> (image_caseid));*
>>>>
>>>> With secondary indexes on x and y:
>>>> *cqlsh:images> create index results1_y on results1(y);*
>>>> *cqlsh:images> create index results1_x on results1(x);*
>>>>
>>>> But when I try the following the query it is not allowed with an error.
>>>>
>>>> *cqlsh:images> select * from results1 where image_caseid='mehak' and x
>>>> > 100 and y <100 allow filtering;*
>>>> *code=2200 [Invalid query] message="No indexed columns present in
>>>> by-columns clause with Equal operator"*
>>>>
>>>> Can you tell me why this is not working. I can't understand why
>>>> Cassandra is not allowing me to use Non EQ range query on secondary indexes
>>>> even when I have restricted my primary key by EQ as specified in error
>>>> message.
>>>>
>>>> Also please elaborate on how to use Lucene/Solr in this case with
>>>> Cassandra.
>>>>
>>>> Thanks,
>>>> Mehak
>>>>
>>>> On Tue, Mar 17, 2015 at 2:26 PM, 健峰 高 <g@beecloud.cn> wrote:
>>>>
>>>>> Lucene or Solr is better. I’m using lucene
>>>>>
>>>>> 在 2015年3月18日,上午2:11,Jack Krupansky <jack.krupansky@gmail.com>
写道:
>>>>>
>>>>> 1. Create multiple secondary indexes, one for each non-key column you
>>>>> need to index on. Not recommended. Considered an anti-pattern for Cassandra.
>>>>> 2. Use DSE Search/Solr.
>>>>> 3. Use Lucene-based indexing with TumpleJump/Stargate or Stratio.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Tue, Mar 17, 2015 at 1:49 PM, Mehak Mehta <
>>>>> memehta@cs.stonybrook.edu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to perform range queries (as in x and y ranges) on a large
>>>>>> data billions of rows.
>>>>>> CQL allows me to put Non EQ restrictions on only one of the
>>>>>> clustering columns.
>>>>>> Its not allowing me to filter the data using any other column even
>>>>>> with use of Allow Filtering option.
>>>>>>
>>>>>> cqlsh:images> *select * from results1 where image_caseid='mehak'
and
>>>>>> x > 100 and y <100 order by x allow filtering;*
>>>>>> *code=2200 [Invalid query] message="No indexed columns present in
>>>>>> by-columns clause with Equal operator"*
>>>>>>
>>>>>> My table definition is :
>>>>>>
>>>>>> *CREATE TABLE images.results1 (uuid uuid, analysis_execution_id
>>>>>> varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar,
w
>>>>>> double, h double, normalized varchar, type varchar, filehost varchar,
>>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid
varchar,
>>>>>> image_mpp_x double, image_mpp_y double, image_width double, image_height
>>>>>> double, objective double, cancer_type varchar,  Area float, submit_date
>>>>>> timestamp, points list<double>,  PRIMARY KEY ((image_caseid),x));*
>>>>>>
>>>>>> Kindly suggest me how to model is data and perform 2d or in general
>>>>>> multi dimension range queries in Cassandra efficiently.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mehak
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Andrés de la Peña
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>

Mime
View raw message