cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morgan Segalis <msega...@gmail.com>
Subject Re: Data model question, storing Queue Message
Date Mon, 30 Apr 2012 12:22:31 GMT
Hi Samal,

Thanks for the TTL feature, I wasn't aware of it's existence.

Day's partitioning will be less wider than month partitionning (about 30 times less give or
take ;-) )
Per day it should have something like 100 000 messages stored, most of it would be retrieved
so deleted before the TTL feature should come do it's work.

Le 30 avr. 2012 à 13:16, samal a écrit :

> 
> 
> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msegalis@gmail.com> wrote:
> Hi Aaron,
> 
> Thank you for your answer, I was beginning to think that my question would never be answered
;-)
> 
> Actually, this is what I was going for, except one thing, instead of partitioning row
per month, I though about partitioning per day, like that everyday I launch the cleaning tool,
and it will delete the day from X month earlier.
> 
> USE TTL feature of column as it will remove column after TTL is over (no need for manual
job). 
> 
> I guess that will reduce the workload drastically, does it have any downside comparing
to month partitioning?
> 
> key belongs to particular node , so depending on size of your data day or month wise
partitioning matters. Other wise it can lead to Fat row which will cause system problem. 
> 
>  
> At one point I was going to do something like the twissandra example, Having a CF per
User's queue, and another CF per day storing every message's ID of the day, in that way If
I want to delete them, I only look into this row, and delete them using ID's for deleting
them in the User's queue CF… Is that a good way to do ? Or should I stick with the first
implementation ?
> 
> Best regards,
> 
> Morgan.
> 
> Le 30 avr. 2012 à 05:52, aaron morton a écrit :
> 
>> Message Queue is often not a great use case for Cassandra. For information on how
to handle high delete workloads see http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>> 
>> It hard to create a model without some idea of the data load, but I would suggest
you start with:
>> 
>> CF: UserMessages
>> Key: ReceiverID
>> Columns : column name = TimeUUID ; column value = message ID and Body
>> 
>> That will order the messages by time. 
>> 
>> Depending on load (and to support deleting a previous months messages) you may want
to partition the rows by month:
>> 
>> CF: UserMessagesMonth
>> Key: ReceiverID+YYYYMM
>> Columns : column name = TimeUUID ; column value = message ID and Body
>> 
>> Everything the same as before. But now a user has a row for each month and which
you can delete as a whole. This also helps avoid very big rows. 
>> 
>>> I really don't think that storage will be an issue, I have 2TB per nodes, messages
are 1KB limited.
>> I would suggest you keep the per node limit to 300 to 400 GB. It can take a long
time to compact, repair and move the data when it gets above 400GB. 
>> 
>> Hope that helps. 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>> 
>>> Hi everyone !
>>> 
>>> I'm fairly new to cassandra and I'm not quite yet familiarized with column oriented
NoSQL model.
>>> I have worked a while on it, but I can't seems to find the best model for what
I'm looking for.
>>> 
>>> I have a Erlang software that let user connecting and communicate with each others,
when an user (A) sends
>>> a message to a disconnected user (B), it stores it on the database and wait for
the user (B) to connect and retrieve
>>> the message queue, and deletes it. 
>>> 
>>> Here's some key point : 
>>> - Users are identified by integer IDs
>>> - Each message are unique by combination of : Sender ID - Receiver ID - Message
ID - time
>>> 
>>> I have a queue Message, and here's the operations I would need to do as fast
as possible : 
>>> 
>>> - Store from 1 to X messages per registered user
>>> - Get the number of stored messages per user (Can be a incremental variable updated
at each store // this is often retrieved)
>>> - retrieve all messages from an user at once.
>>> - delete all messages from an user at once.
>>> - delete all messages that are older than Y months (from all users).
>>> 
>>> I really don't think that storage will be an issue, I have 2TB per nodes, messages
are 1KB limited.
>>> I'm really looking for speed rather than storage optimization.
>>> 
>>> My configuration is 2 dedicated server which are both :
>>> - 4 x Intel i7 2.66 Ghz
>>> - 64 bits
>>> - 24 Go
>>> - 2 TB
>>> 
>>> Thank you all.
>> 
> 
> 


Mime
View raw message