incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Byron Wang <byron.w...@woowteam.com>
Subject Multiple Primary Keys on an IN clause or 2i?
Date Tue, 26 Mar 2013 06:17:53 GMT
Hi,  

I'm currently trying to implement an offline message retrieval solution wherein I retrieve
messages after a particular timestamp for specific users. My question is will what route should
I go for…multple primary keys on an IN clause or using 2i


The current model of the messages table looks something like this

CREATE TABLE msg_archive(
thread_id varchar,
ts timestamp,
msg blob,
PRIMARY KEY (thread_id, ts))



where thread_id is an alphabetized order of sender and recipient such as "brian|john"

Now, in order to retrieve the messages, I will have to retrieve them based on the number of
contacts you have and as such the query will look something like this

SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) AND ts <
1234567890;

Ofcourse the list of friends a user can have can potentially reach around 500 or even worse
1000 so the IN clause can potentially have these large amount of primary keys.




The question is will this work well or do I have to modify the schema such that we should
incorporate secondary indexes And look something like this instead?

CREATE TABLE msg_archive(
thread_id varchar,
recipient varchar,
ts timestamp,
msg blob,
PRIMARY KEY (thread_id, ts))



CREATE INDEX ON msg_archive (recipient);

For the select statement, ofcourse it will be as simple as

SELECT * FROM msg_archive WHERE recipient = 'brian' AND ts < 1234567890;



Which is actually better in terms of performance? Or are there other suggestions to this kind
of model?

Thanks!
Byron



--  
Byron Wang
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)



Mime
View raw message