cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morgan Segalis <msega...@gmail.com>
Subject Re: Data model question, storing Queue Message
Date Mon, 30 Apr 2012 10:55:15 GMT
Hi Aaron,

Thank you for your answer, I was beginning to think that my question would never be answered
;-)

Actually, this is what I was going for, except one thing, instead of partitioning row per
month, I though about partitioning per day, like that everyday I launch the cleaning tool,
and it will delete the day from X month earlier. I guess that will reduce the workload drastically,
does it have any downside comparing to month partitioning?

At one point I was going to do something like the twissandra example, Having a CF per User's
queue, and another CF per day storing every message's ID of the day, in that way If I want
to delete them, I only look into this row, and delete them using ID's for deleting them in
the User's queue CF… Is that a good way to do ? Or should I stick with the first implementation
?

Best regards,

Morgan.

Le 30 avr. 2012 à 05:52, aaron morton a écrit :

> Message Queue is often not a great use case for Cassandra. For information on how to
handle high delete workloads see http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
> 
> It hard to create a model without some idea of the data load, but I would suggest you
start with:
> 
> CF: UserMessages
> Key: ReceiverID
> Columns : column name = TimeUUID ; column value = message ID and Body
> 
> That will order the messages by time. 
> 
> Depending on load (and to support deleting a previous months messages) you may want to
partition the rows by month:
> 
> CF: UserMessagesMonth
> Key: ReceiverID+YYYYMM
> Columns : column name = TimeUUID ; column value = message ID and Body
> 
> Everything the same as before. But now a user has a row for each month and which you
can delete as a whole. This also helps avoid very big rows. 
> 
>> I really don't think that storage will be an issue, I have 2TB per nodes, messages
are 1KB limited.
> I would suggest you keep the per node limit to 300 to 400 GB. It can take a long time
to compact, repair and move the data when it gets above 400GB. 
> 
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
> 
>> Hi everyone !
>> 
>> I'm fairly new to cassandra and I'm not quite yet familiarized with column oriented
NoSQL model.
>> I have worked a while on it, but I can't seems to find the best model for what I'm
looking for.
>> 
>> I have a Erlang software that let user connecting and communicate with each others,
when an user (A) sends
>> a message to a disconnected user (B), it stores it on the database and wait for the
user (B) to connect and retrieve
>> the message queue, and deletes it. 
>> 
>> Here's some key point : 
>> - Users are identified by integer IDs
>> - Each message are unique by combination of : Sender ID - Receiver ID - Message ID
- time
>> 
>> I have a queue Message, and here's the operations I would need to do as fast as possible
: 
>> 
>> - Store from 1 to X messages per registered user
>> - Get the number of stored messages per user (Can be a incremental variable updated
at each store // this is often retrieved)
>> - retrieve all messages from an user at once.
>> - delete all messages from an user at once.
>> - delete all messages that are older than Y months (from all users).
>> 
>> I really don't think that storage will be an issue, I have 2TB per nodes, messages
are 1KB limited.
>> I'm really looking for speed rather than storage optimization.
>> 
>> My configuration is 2 dedicated server which are both :
>> - 4 x Intel i7 2.66 Ghz
>> - 64 bits
>> - 24 Go
>> - 2 TB
>> 
>> Thank you all.
> 


Mime
View raw message