Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C534C061 for ; Mon, 30 Apr 2012 12:28:57 +0000 (UTC) Received: (qmail 14166 invoked by uid 500); 30 Apr 2012 12:28:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 14134 invoked by uid 500); 30 Apr 2012 12:28:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 14123 invoked by uid 99); 30 Apr 2012 12:28:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2012 12:28:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of samalgorai@gmail.com designates 209.85.212.172 as permitted sender) Received: from [209.85.212.172] (HELO mail-wi0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Apr 2012 12:28:48 +0000 Received: by wibhj6 with SMTP id hj6so2009137wib.7 for ; Mon, 30 Apr 2012 05:28:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=96m25b379TnmMwQ0UvvfxirznyhBRy1VK2qc9zBrAyE=; b=SZCgm0rLBjKsazPXR3FrGq0/4Ivkh0m7gRTG5MjPjz8eLNDxgjmx+o6+fMLRccdY7Y m/73u79VG+4eXtwfh2BUrou3LyPphqY6jaTnTOhB8JQzuWz6Cj/7kkv0i8GuSs95hvee UYcDCM/Ro64Hu1GVYRlQq6wmCcoz77+S7iYflQfvHqJL7J8ejyZ9SBUCciqkTd5H5y9s 1eLCLUfVpmbPfBUOu2jOt0RPDGU7TYglrPtU+5lyBRwx3buU2AjJjCRngrmlVXvWqWxi 1rG3IDPFuBP6MUy2Z3LqF8X6DQBlW/E4PkTv/t2b+9QyGFHzd3WcQ9XdYt9dCxikiNxe 0Npg== MIME-Version: 1.0 Received: by 10.180.81.166 with SMTP id b6mr9055574wiy.0.1335788907788; Mon, 30 Apr 2012 05:28:27 -0700 (PDT) Received: by 10.223.106.207 with HTTP; Mon, 30 Apr 2012 05:28:27 -0700 (PDT) In-Reply-To: <83E0C4FB-12C7-40F9-B02C-2461005C22C8@gmail.com> References: <9921B592-8A19-4D4B-B18F-343F17A75DB4@gmail.com> <79DD1E61-21CA-4C7B-8D4E-3D61B5896D86@gmail.com> <83E0C4FB-12C7-40F9-B02C-2461005C22C8@gmail.com> Date: Mon, 30 Apr 2012 17:58:27 +0530 Message-ID: Subject: Re: Data model question, storing Queue Message From: samal To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0442811e57fb7f04bee497c9 --f46d0442811e57fb7f04bee497c9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis wrote: > Hi Samal, > > Thanks for the TTL feature, I wasn't aware of it's existence. > > Day's partitioning will be less wider than month partitionning (about 30 > times less give or take ;-) ) > Per day it should have something like 100 000 messages stored, most of it > would be retrieved so deleted before the TTL feature should come do it's > work. > TTL is the last day column can exist in c-world after that it is deleted. Deleting before TTL is fine. Have you considered KAFKA http://incubator.apache.org/kafka/ > Le 30 avr. 2012 =C3=A0 13:16, samal a =C3=A9crit : > > > > On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis wrote= : > >> Hi Aaron, >> >> Thank you for your answer, I was beginning to think that my question >> would never be answered ;-) >> >> Actually, this is what I was going for, except one thing, instead of >> partitioning row per month, I though about partitioning per day, like th= at >> everyday I launch the cleaning tool, and it will delete the day from X >> month earlier. >> > > USE TTL feature of column as it will remove column after TTL is over (no > need for manual job). > > I guess that will reduce the workload drastically, does it have any >> downside comparing to month partitioning? >> > > key belongs to particular node , so depending on size of your data day or > month wise partitioning matters. Other wise it can lead to Fat row which > will cause system problem. > > > >> At one point I was going to do something like the twissandra example, >> Having a CF per User's queue, and another CF per day storing every >> message's ID of the day, in that way If I want to delete them, I only lo= ok >> into this row, and delete them using ID's for deleting them in the User'= s >> queue CF=E2=80=A6 Is that a good way to do ? Or should I stick with the = first >> implementation ? >> >> Best regards, >> >> Morgan. >> >> Le 30 avr. 2012 =C3=A0 05:52, aaron morton a =C3=A9crit : >> >> Message Queue is often not a great use case for Cassandra. For >> information on how to handle high delete workloads see >> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra >> >> It hard to create a model without some idea of the data load, but I woul= d >> suggest you start with: >> >> CF: UserMessages >> Key: ReceiverID >> Columns : column name =3D TimeUUID ; column value =3D message ID and Bod= y >> >> That will order the messages by time. >> >> Depending on load (and to support deleting a previous months messages) >> you may want to partition the rows by month: >> >> CF: UserMessagesMonth >> Key: ReceiverID+YYYYMM >> Columns : column name =3D TimeUUID ; column value =3D message ID and Bod= y >> >> Everything the same as before. But now a user has a row for each month >> and which you can delete as a whole. This also helps avoid very big rows= . >> >> I really don't think that storage will be an issue, I have 2TB per nodes= , >> messages are 1KB limited. >> >> I would suggest you keep the per node limit to 300 to 400 GB. It can tak= e >> a long time to compact, repair and move the data when it gets above 400G= B. >> >> Hope that helps. >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote: >> >> Hi everyone ! >> >> I'm fairly new to cassandra and I'm not quite yet familiarized with >> column oriented NoSQL model. >> I have worked a while on it, but I can't seems to find the best model fo= r >> what I'm looking for. >> >> I have a Erlang software that let user connecting and communicate with >> each others, when an user (A) sends >> a message to a disconnected user (B), it stores it on the database and >> wait for the user (B) to connect and retrieve >> the message queue, and deletes it. >> >> Here's some key point : >> - Users are identified by integer IDs >> - Each message are unique by combination of : Sender ID - Receiver ID - >> Message ID - time >> >> I have a queue Message, and here's the operations I would need to do as >> fast as possible : >> >> - Store from 1 to X messages per registered user >> - Get the number of stored messages per user (Can be a incremental >> variable updated at each store // this is often retrieved) >> - retrieve all messages from an user at once. >> - delete all messages from an user at once. >> - delete all messages that are older than Y months (from all users). >> >> I really don't think that storage will be an issue, I have 2TB per nodes= , >> messages are 1KB limited. >> I'm really looking for speed rather than storage optimization. >> >> My configuration is 2 dedicated server which are both : >> - 4 x Intel i7 2.66 Ghz >> - 64 bits >> - 24 Go >> - 2 TB >> >> Thank you all. >> >> >> >> > > --f46d0442811e57fb7f04bee497c9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On Mon, Apr 30, 2012 at 5:52 PM, Morgan = Segalis <msegalis@gmail.com> wrote:
Hi Samal,

Thanks for= the TTL feature, I wasn't aware of it's existence.

<= /div>
Day's partitioning will be less wider than month partitionnin= g (about 30 times less give or take ;-) )
Per day it should have something like 100 000 messages stored, most of= it would be retrieved so deleted before the TTL feature should come do it&= #39;s work.

TTL is the last day column can= exist in c-world after that it is deleted. Deleting before TTL is fine. Have you considered KAFKA ht= tp://incubator.apache.org/kafka/
=C2=A0

=C2=A0
Le 30 avr. 2012 =C3=A0 1= 3:16, samal a =C3=A9crit :



On Mon, Apr 30, 2012 at 4:25 P= M, Morgan Segalis <msegalis@gmail.com> wrote:
Hi Aaron,

Thank you = for your answer, I was beginning to think that my question would never be a= nswered ;-)

Actually, this is what I was going for= , except one thing, instead of partitioning row per month, I though about p= artitioning per day, like that everyday I launch the cleaning tool, and it = will delete the day from X month earlier.

USE TTL feature of column as it will remove col= umn after TTL is over (no need for manual job).

I guess that will reduce the work= load drastically, does it have any downside comparing to month partitioning= ?

key belongs to particular node , so depe= nding on size of your data day or month wise partitioning matters. Other wi= se it can lead to Fat row which will cause system problem.

=C2=A0
At one point I was going to do s= omething like the twissandra example, Having a CF per User's queue, and= another CF per day storing every message's ID of the day, in that way = If I want to delete them, I only look into this row, and delete them using = ID's for deleting them in the User's queue CF=E2=80=A6 Is that a go= od way to do ? Or should I stick with the first implementation ?

Best regards,

Morgan.

Le 30 avr. 2012 =C3=A0 05:52, aaron morton a =C3=A9crit :

Message Queue is often not a great use case for Cassandra. For information = on how=C2=A0to handle high delete workloads see=C2=A0http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandr= a

It hard to create a model without some idea of the data load= , but I would suggest you start with:

CF: UserMess= ages
Key: ReceiverID
Columns : column name =3D TimeUUID= ; column value =3D message ID and Body

That will order the messages by time.=C2=A0
<= br>
Depending on load (and to support deleting a previous months = messages) you may want to partition the rows by month:

CF: UserMessagesMonth
Key: ReceiverID+YYYYMM
Colum= ns : column name =3D TimeUUID ; column value =3D message ID and Body
<= /div>

Everything the same as before. But now a user has = a row for each month and which you can delete as a whole. This also helps a= void very big rows.=C2=A0

I really don't think= that storage will be an issue, I have 2TB per nodes, messages are 1KB limi= ted.
I would suggest you keep the per node limit to 3= 00 to 400 GB. It can take a long time to compact, repair and move the data = when it gets above 400GB.=C2=A0

Hope that helps.=C2=A0

<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:

Hi everyone !

I'm fairly new to cass= andra and I'm not quite yet familiarized with column oriented NoSQL mod= el.
I have worked a while on it, but I can't seems to find the best model f= or what I'm looking for.

I have a Erlang software that let user = connecting and communicate with each others, when an user (A) sends
a message to a disconnected user (B), it stores it on the database and wait= for the user (B) to connect and retrieve
the message queue, and deletes= it.

Here's some key point :
- Users are identified by inte= ger IDs
- Each message are unique by combination of : Sender ID - Receiver ID - Mes= sage ID - time

I have a queue Message, and here's the operations= I would need to do as fast as possible :

- Store from 1 to X messa= ges per registered user
- Get the number of stored messages per user (Can be a incremental variable= updated at each store // this is often retrieved)
- retrieve all messag= es from an user at once.
- delete all messages from an user at once.
- delete all messages that are older than Y months (from all users).
I really don't think that storage will be an issue, I have 2TB per nod= es, messages are 1KB limited.
I'm really looking for speed rather th= an storage optimization.

My configuration is 2 dedicated server which are both :
- 4 x Intel = i7 2.66 Ghz
- 64 bits
- 24 Go
- 2 TB

Thank you all.





--f46d0442811e57fb7f04bee497c9--