cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subodh Nijsure <subodh.nijs...@gmail.com>
Subject Re: Help with select IN query in cassandra
Date Mon, 01 Sep 2014 03:08:13 GMT
Thanks for your help Michael.

If specifying asset_id would help I can construct queries that can
include asset_id

So I have been "playing" around with PRIMARY KEY definition and
following table definition

CREATE TABLE sensor_info_table (
  asset_id text,
  event_time timestamp,
  "timestamp" timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY ((asset_id, "timestamp"), event_time)
);

It does what I want to do, and I removed the index for timestamp item
since now it is part of primary key and thus my query like this works.

SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
timestamp IN ( 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
);

But now this doesn't work it give

SELECT * from sensor_info_table where  asset_id='3' ;

Bad Request: Partition key part timestamp must be restricted since
preceding part is

I am keeping index on event_time as I sometime need to query something
"give me all data since time x" i.e. something like this works.

 SELECT * from sensor_info_table where  event_time > '2014-08-31
16:54:02-0700' ALLOW FILTERING;

However if I do this things then this don't work:

SELECT * from sensor_info_table where  asset_id='3' AND event_time >
'2014-08-31 16:54:02-0700';

Bad Request: Partition key part timestamp must be restricted since
preceding part is

Also  I am not conformable with fact that I need to specify ALLOW FILTERING.

I guess cassandra schema design task asks designer to write down
queries before designing schema.

For the above table definition I want to do following queries:

- Give me all sensor data for given asset.
- Give me sensor data that matches given set of timeuuids
- Give me all sendor data for a given asset, that were collected after
| before | between  certain event_time.

Given these query criteria how should  I construct my schema? One
thought has occurred to me is make three tables with each item
asset_id , event_time, timeuuid as primary keys and depending on type
of query choose the table to do query upon. That seems like a waste of
resources (disk, cpu ), also increasing insert times(!) but thats the
way things need to happen in cassandra world its okay. ( I am
two-three weeks into learning about cassandra).

-Subodh

On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
<michael.laing@nytimes.com> wrote:
> Oh it must be late - I missed the fact that you didn't want to specify
> asset_id. The above queries will still work but you have to use 'allow
> filtering' - generally not a good idea. I'll look again in the morning.
>
>
> On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <michael.laing@nytimes.com>
> wrote:
>>
>> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
>> must be specified as well - hopefully that info is available to the ux.
>>
>> Unfortunately you will then hit another problem with your query: you are
>> selecting a collection field... this will not work with IN on "timestamp".
>>
>> So you could select all the "timestamp"s for an asset_id/event_time:
>>>
>>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>>> 1231234;
>>
>>
>> Or you could apply a range of "timestamp"s:
>>>
>>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>>
>>
>> BTW the secondary indices are not a good idea: high cardinality and of no
>> use in this query that I can see.
>>
>> ml
>>
>>
>> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <subodh.nijsure@gmail.com>
>> wrote:
>>>
>>> Not really event time stamp is created by the sensor when it reads data
>>> and  timestamp is something server creates when inserting data into
>>> cassandra db.  At later point in time my django ux allows users to browse
>>> this data and reference interesting data points via the timestamp field. The
>>> timestamp field is my bridge between Sal and nosql world.
>>>
>>> Subodh
>>>
>>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <michael.laing@nytimes.com>
>>> wrote:
>>>>
>>>> Are event_time and timestamp essentially representing the same datetime?
>>>>
>>>> On Sunday, August 31, 2014, Subodh Nijsure <subodh.nijsure@gmail.com>
>>>> wrote:
>>>>>
>>>>> I have following database schema
>>>>>
>>>>> CREATE TABLE sensor_info_table (
>>>>>   asset_id text,
>>>>>   event_time timestamp,
>>>>>   "timestamp" timeuuid,
>>>>>   sensor_reading map<text, text>,
>>>>>   sensor_serial_number text,
>>>>>   sensor_type int,
>>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>>>> );
>>>>>
>>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>>>>
>>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>>>>
>>>>> Now I am able to insert the data into this table, however I am unable
>>>>> to do following query where I want to select items with specific
>>>>> timeuuid values.
>>>>>
>>>>> It gives me following error.
>>>>>
>>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>>>>
>>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>>>> (preceding column "event_time" is either not restricted or by a non-EQ
>>>>> relation)
>>>>>
>>>>> What do I have to do to make this work?
>>>>>
>>>>> For what its worth I am using django for my front end development and
>>>>> I am using "timestamp timeuuid" field as unique indentifier to
>>>>> reference specific sensor reading from django framework -- since
>>>>> cassandra doesn't have way to generate unique id upon insert (like
>>>>> old-style rdms's auto-fields).
>>>>>
>>>>>
>>>>> Below is software version info.
>>>>>
>>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>>>> Thrift protocol 19.39.0]
>>>>>
>>>>> I really don't understand what the error message preceeding column
>>>>> "event_time" is either not restricted or by no-EQ relation?
>>>>>
>>>>> -Subodh Nijsure
>>
>>
>

Mime
View raw message