incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Multiple Primary Keys on an IN clause or 2i?
Date Wed, 27 Mar 2013 20:03:19 GMT
> CREATE TABLE msg_archive(
> thread_id varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
This with reversed clustering so the most recent columns are at the start (makes it quicker
to get the last X messages) see http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_TABLE#cql-create-columnfamily

>  SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) AND ts
< 1234567890;
> 

This but do not request 1000 rows per call. Each row becomes a request on a node, and there
is a max of 32 threads processing those reads. Requests of 50 to 100 are reasonable depending
on the number of nodes you have. If you only have 3 nodes I would start smaller. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 7:17 PM, Byron Wang <byron.wang@woowteam.com> wrote:

> Hi,  
> 
> I'm currently trying to implement an offline message retrieval solution wherein I retrieve
messages after a particular timestamp for specific users. My question is will what route should
I go for…multple primary keys on an IN clause or using 2i
> 
> 
> The current model of the messages table looks something like this
> 
> CREATE TABLE msg_archive(
> thread_id varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
> 
> 
> 
> where thread_id is an alphabetized order of sender and recipient such as "brian|john"
> 
> Now, in order to retrieve the messages, I will have to retrieve them based on the number
of contacts you have and as such the query will look something like this
> 
> SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) AND ts
< 1234567890;
> 
> Ofcourse the list of friends a user can have can potentially reach around 500 or even
worse 1000 so the IN clause can potentially have these large amount of primary keys.
> 
> 
> 
> 
> The question is will this work well or do I have to modify the schema such that we should
incorporate secondary indexes And look something like this instead?
> 
> CREATE TABLE msg_archive(
> thread_id varchar,
> recipient varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
> 
> 
> 
> CREATE INDEX ON msg_archive (recipient);
> 
> For the select statement, ofcourse it will be as simple as
> 
> SELECT * FROM msg_archive WHERE recipient = 'brian' AND ts < 1234567890;
> 
> 
> 
> Which is actually better in terms of performance? Or are there other suggestions to this
kind of model?
> 
> Thanks!
> Byron
> 
> 
> 
> --  
> Byron Wang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> 


Mime
View raw message