incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From samal <samalgo...@gmail.com>
Subject Re: Data model question, storing Queue Message
Date Mon, 30 Apr 2012 11:16:50 GMT
On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msegalis@gmail.com> wrote:

> Hi Aaron,
>
> Thank you for your answer, I was beginning to think that my question would
> never be answered ;-)
>
> Actually, this is what I was going for, except one thing, instead of
> partitioning row per month, I though about partitioning per day, like that
> everyday I launch the cleaning tool, and it will delete the day from X
> month earlier.
>

USE TTL feature of column as it will remove column after TTL is over (no
need for manual job).

I guess that will reduce the workload drastically, does it have any
> downside comparing to month partitioning?
>

key belongs to particular node , so depending on size of your data day or
month wise partitioning matters. Other wise it can lead to Fat row which
will cause system problem.



> At one point I was going to do something like the twissandra example,
> Having a CF per User's queue, and another CF per day storing every
> message's ID of the day, in that way If I want to delete them, I only look
> into this row, and delete them using ID's for deleting them in the User's
> queue CF… Is that a good way to do ? Or should I stick with the first
> implementation ?
>
> Best regards,
>
> Morgan.
>
> Le 30 avr. 2012 à 05:52, aaron morton a écrit :
>
> Message Queue is often not a great use case for Cassandra. For information
> on how to handle high delete workloads see
> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>
> It hard to create a model without some idea of the data load, but I would
> suggest you start with:
>
> CF: UserMessages
> Key: ReceiverID
> Columns : column name = TimeUUID ; column value = message ID and Body
>
> That will order the messages by time.
>
> Depending on load (and to support deleting a previous months messages) you
> may want to partition the rows by month:
>
> CF: UserMessagesMonth
> Key: ReceiverID+YYYYMM
> Columns : column name = TimeUUID ; column value = message ID and Body
>
> Everything the same as before. But now a user has a row for each month and
> which you can delete as a whole. This also helps avoid very big rows.
>
> I really don't think that storage will be an issue, I have 2TB per nodes,
> messages are 1KB limited.
>
> I would suggest you keep the per node limit to 300 to 400 GB. It can take
> a long time to compact, repair and move the data when it gets above 400GB.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>
> Hi everyone !
>
> I'm fairly new to cassandra and I'm not quite yet familiarized with column
> oriented NoSQL model.
> I have worked a while on it, but I can't seems to find the best model for
> what I'm looking for.
>
> I have a Erlang software that let user connecting and communicate with
> each others, when an user (A) sends
> a message to a disconnected user (B), it stores it on the database and
> wait for the user (B) to connect and retrieve
> the message queue, and deletes it.
>
> Here's some key point :
> - Users are identified by integer IDs
> - Each message are unique by combination of : Sender ID - Receiver ID -
> Message ID - time
>
> I have a queue Message, and here's the operations I would need to do as
> fast as possible :
>
> - Store from 1 to X messages per registered user
> - Get the number of stored messages per user (Can be a incremental
> variable updated at each store // this is often retrieved)
> - retrieve all messages from an user at once.
> - delete all messages from an user at once.
> - delete all messages that are older than Y months (from all users).
>
> I really don't think that storage will be an issue, I have 2TB per nodes,
> messages are 1KB limited.
> I'm really looking for speed rather than storage optimization.
>
> My configuration is 2 dedicated server which are both :
> - 4 x Intel i7 2.66 Ghz
> - 64 bits
> - 24 Go
> - 2 TB
>
> Thank you all.
>
>
>
>

Mime
View raw message