cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From samal <samalgo...@gmail.com>
Subject Re: Data model question, storing Queue Message
Date Mon, 30 Apr 2012 12:28:27 GMT
On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis <msegalis@gmail.com> wrote:

> Hi Samal,
>
> Thanks for the TTL feature, I wasn't aware of it's existence.
>
> Day's partitioning will be less wider than month partitionning (about 30
> times less give or take ;-) )
> Per day it should have something like 100 000 messages stored, most of it
> would be retrieved so deleted before the TTL feature should come do it's
> work.
>

TTL is the last day column can exist in c-world after that it is deleted.
Deleting before TTL is fine.
Have you considered KAFKA http://incubator.apache.org/kafka/




> Le 30 avr. 2012 à 13:16, samal a écrit :
>
>
>
> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msegalis@gmail.com>wrote:
>
>> Hi Aaron,
>>
>> Thank you for your answer, I was beginning to think that my question
>> would never be answered ;-)
>>
>> Actually, this is what I was going for, except one thing, instead of
>> partitioning row per month, I though about partitioning per day, like that
>> everyday I launch the cleaning tool, and it will delete the day from X
>> month earlier.
>>
>
> USE TTL feature of column as it will remove column after TTL is over (no
> need for manual job).
>
>  I guess that will reduce the workload drastically, does it have any
>> downside comparing to month partitioning?
>>
>
> key belongs to particular node , so depending on size of your data day or
> month wise partitioning matters. Other wise it can lead to Fat row which
> will cause system problem.
>
>
>
>> At one point I was going to do something like the twissandra example,
>> Having a CF per User's queue, and another CF per day storing every
>> message's ID of the day, in that way If I want to delete them, I only look
>> into this row, and delete them using ID's for deleting them in the User's
>> queue CF… Is that a good way to do ? Or should I stick with the first
>> implementation ?
>>
>> Best regards,
>>
>> Morgan.
>>
>> Le 30 avr. 2012 à 05:52, aaron morton a écrit :
>>
>> Message Queue is often not a great use case for Cassandra. For
>> information on how to handle high delete workloads see
>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>>
>> It hard to create a model without some idea of the data load, but I would
>> suggest you start with:
>>
>> CF: UserMessages
>> Key: ReceiverID
>> Columns : column name = TimeUUID ; column value = message ID and Body
>>
>> That will order the messages by time.
>>
>> Depending on load (and to support deleting a previous months messages)
>> you may want to partition the rows by month:
>>
>> CF: UserMessagesMonth
>> Key: ReceiverID+YYYYMM
>> Columns : column name = TimeUUID ; column value = message ID and Body
>>
>> Everything the same as before. But now a user has a row for each month
>> and which you can delete as a whole. This also helps avoid very big rows.
>>
>> I really don't think that storage will be an issue, I have 2TB per nodes,
>> messages are 1KB limited.
>>
>> I would suggest you keep the per node limit to 300 to 400 GB. It can take
>> a long time to compact, repair and move the data when it gets above 400GB.
>>
>> Hope that helps.
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>>
>> Hi everyone !
>>
>> I'm fairly new to cassandra and I'm not quite yet familiarized with
>> column oriented NoSQL model.
>> I have worked a while on it, but I can't seems to find the best model for
>> what I'm looking for.
>>
>> I have a Erlang software that let user connecting and communicate with
>> each others, when an user (A) sends
>> a message to a disconnected user (B), it stores it on the database and
>> wait for the user (B) to connect and retrieve
>> the message queue, and deletes it.
>>
>> Here's some key point :
>> - Users are identified by integer IDs
>> - Each message are unique by combination of : Sender ID - Receiver ID -
>> Message ID - time
>>
>> I have a queue Message, and here's the operations I would need to do as
>> fast as possible :
>>
>> - Store from 1 to X messages per registered user
>> - Get the number of stored messages per user (Can be a incremental
>> variable updated at each store // this is often retrieved)
>> - retrieve all messages from an user at once.
>> - delete all messages from an user at once.
>> - delete all messages that are older than Y months (from all users).
>>
>> I really don't think that storage will be an issue, I have 2TB per nodes,
>> messages are 1KB limited.
>> I'm really looking for speed rather than storage optimization.
>>
>> My configuration is 2 dedicated server which are both :
>> - 4 x Intel i7 2.66 Ghz
>> - 64 bits
>> - 24 Go
>> - 2 TB
>>
>> Thank you all.
>>
>>
>>
>>
>
>

Mime
View raw message