Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of samalgorai@gmail.com designates
 209.85.212.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <83E0C4FB-12C7-40F9-B02C-2461005C22C8@gmail.com>
References: <9921B592-8A19-4D4B-B18F-343F17A75DB4@gmail.com>
	<E3B8DBDE-EAB4-43AA-BA34-3CA0862EBAE1@thelastpickle.com>
	<79DD1E61-21CA-4C7B-8D4E-3D61B5896D86@gmail.com>
	<CAP=kKf_OHrMXEikD2ifupOj2nxzYTNxO67v0e4DyvtKM0YXKzg@mail.gmail.com>
	<83E0C4FB-12C7-40F9-B02C-2461005C22C8@gmail.com>
Date: Mon, 30 Apr 2012 17:58:27 +0530
Message-ID: 
 <CAP=kKf9KxGohmYWdTABN_Ym+rieUgkDh659nzsB81yP8u-CR1A@mail.gmail.com>
Subject: Re: Data model question, storing Queue Message
From: samal <samalgorai@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d0442811e57fb7f04bee497c9

--f46d0442811e57fb7f04bee497c9
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis <msegalis@gmail.com> wrote:

> Hi Samal,
>
> Thanks for the TTL feature, I wasn't aware of it's existence.
>
> Day's partitioning will be less wider than month partitionning (about 30
> times less give or take ;-) )
> Per day it should have something like 100 000 messages stored, most of it
> would be retrieved so deleted before the TTL feature should come do it's
> work.
>

TTL is the last day column can exist in c-world after that it is deleted.
Deleting before TTL is fine.
Have you considered KAFKA http://incubator.apache.org/kafka/


> Le 30 avr. 2012 =C3=A0 13:16, samal a =C3=A9crit :
>
>
>
> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msegalis@gmail.com>wrote=
:
>
>> Hi Aaron,
>>
>> Thank you for your answer, I was beginning to think that my question
>> would never be answered ;-)
>>
>> Actually, this is what I was going for, except one thing, instead of
>> partitioning row per month, I though about partitioning per day, like th=
at
>> everyday I launch the cleaning tool, and it will delete the day from X
>> month earlier.
>>
>
> USE TTL feature of column as it will remove column after TTL is over (no
> need for manual job).
>
>  I guess that will reduce the workload drastically, does it have any
>> downside comparing to month partitioning?
>>
>
> key belongs to particular node , so depending on size of your data day or
> month wise partitioning matters. Other wise it can lead to Fat row which
> will cause system problem.
>
>
>
>> At one point I was going to do something like the twissandra example,
>> Having a CF per User's queue, and another CF per day storing every
>> message's ID of the day, in that way If I want to delete them, I only lo=
ok
>> into this row, and delete them using ID's for deleting them in the User'=
s
>> queue CF=E2=80=A6 Is that a good way to do ? Or should I stick with the =
first
>> implementation ?
>>
>> Best regards,
>>
>> Morgan.
>>
>> Le 30 avr. 2012 =C3=A0 05:52, aaron morton a =C3=A9crit :
>>
>> Message Queue is often not a great use case for Cassandra. For
>> information on how to handle high delete workloads see
>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>>
>> It hard to create a model without some idea of the data load, but I woul=
d
>> suggest you start with:
>>
>> CF: UserMessages
>> Key: ReceiverID
>> Columns : column name =3D TimeUUID ; column value =3D message ID and Bod=
y
>>
>> That will order the messages by time.
>>
>> Depending on load (and to support deleting a previous months messages)
>> you may want to partition the rows by month:
>>
>> CF: UserMessagesMonth
>> Key: ReceiverID+YYYYMM
>> Columns : column name =3D TimeUUID ; column value =3D message ID and Bod=
y
>>
>> Everything the same as before. But now a user has a row for each month
>> and which you can delete as a whole. This also helps avoid very big rows=
.
>>
>> I really don't think that storage will be an issue, I have 2TB per nodes=
,
>> messages are 1KB limited.
>>
>> I would suggest you keep the per node limit to 300 to 400 GB. It can tak=
e
>> a long time to compact, repair and move the data when it gets above 400G=
B.
>>
>> Hope that helps.
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>>
>> Hi everyone !
>>
>> I'm fairly new to cassandra and I'm not quite yet familiarized with
>> column oriented NoSQL model.
>> I have worked a while on it, but I can't seems to find the best model fo=
r
>> what I'm looking for.
>>
>> I have a Erlang software that let user connecting and communicate with
>> each others, when an user (A) sends
>> a message to a disconnected user (B), it stores it on the database and
>> wait for the user (B) to connect and retrieve
>> the message queue, and deletes it.
>>
>> Here's some key point :
>> - Users are identified by integer IDs
>> - Each message are unique by combination of : Sender ID - Receiver ID -
>> Message ID - time
>>
>> I have a queue Message, and here's the operations I would need to do as
>> fast as possible :
>>
>> - Store from 1 to X messages per registered user
>> - Get the number of stored messages per user (Can be a incremental
>> variable updated at each store // this is often retrieved)
>> - retrieve all messages from an user at once.
>> - delete all messages from an user at once.
>> - delete all messages that are older than Y months (from all users).
>>
>> I really don't think that storage will be an issue, I have 2TB per nodes=
,
>> messages are 1KB limited.
>> I'm really looking for speed rather than storage optimization.
>>
>> My configuration is 2 dedicated server which are both :
>> - 4 x Intel i7 2.66 Ghz
>> - 64 bits
>> - 24 Go
>> - 2 TB
>>
>> Thank you all.
>>
>>
>>
>>
>
>

--f46d0442811e57fb7f04bee497c9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Mon, Apr 30, 2012 at 5:52 PM, Morgan =
Segalis <span dir=3D"ltr">&lt;<a href=3D"mailto:msegalis@gmail.com" target=
=3D"_blank">msegalis@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex">
<div style=3D"word-wrap:break-word">Hi Samal,<div><br></div><div>Thanks for=
 the TTL feature, I wasn&#39;t aware of it&#39;s existence.</div><div><br><=
/div><div>Day&#39;s partitioning will be less wider than month partitionnin=
g (about 30 times less give or take ;-) )</div>
<div>Per day it should have something like 100 000 messages stored, most of=
 it would be retrieved so deleted before the TTL feature should come do it&=
#39;s work.</div></div></blockquote><div><br>TTL is the last day column can=
 exist in c-world after that it is deleted. Deleting before TTL is fine.<br=
>
Have you considered KAFKA <a href=3D"http://incubator.apache.org/kafka/">ht=
tp://incubator.apache.org/kafka/</a> <br>=C2=A0 <br><br>=C2=A0</div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-left:1p=
x solid rgb(204,204,204);padding-left:1ex">
<div style=3D"word-wrap:break-word"><div><div><div>Le 30 avr. 2012 =C3=A0 1=
3:16, samal a =C3=A9crit :</div><div><div class=3D"h5"><br><blockquote type=
=3D"cite"><br><br><div class=3D"gmail_quote">On Mon, Apr 30, 2012 at 4:25 P=
M, Morgan Segalis <span dir=3D"ltr">&lt;<a href=3D"mailto:msegalis@gmail.co=
m" target=3D"_blank">msegalis@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word">Hi Aaron,<div><br></div><div>Thank you =
for your answer, I was beginning to think that my question would never be a=
nswered ;-)</div><div><br></div><div>Actually, this is what I was going for=
, except one thing, instead of partitioning row per month, I though about p=
artitioning per day, like that everyday I launch the cleaning tool, and it =
will delete the day from X month earlier.</div>

</div></blockquote><div><br>USE TTL feature of column as it will remove col=
umn after TTL is over (no need for manual job). <br><br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-left:1px soli=
d rgb(204,204,204);padding-left:1ex">

<div style=3D"word-wrap:break-word"><div> I guess that will reduce the work=
load drastically, does it have any downside comparing to month partitioning=
?</div></div></blockquote><div><br>key belongs to particular node , so depe=
nding on size of your data day or month wise partitioning matters. Other wi=
se it can lead to Fat row which will cause system problem. <br>

<br>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0=
pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div styl=
e=3D"word-wrap:break-word"><div></div><div>At one point I was going to do s=
omething like the twissandra example, Having a CF per User&#39;s queue, and=
 another CF per day storing every message&#39;s ID of the day, in that way =
If I want to delete them, I only look into this row, and delete them using =
ID&#39;s for deleting them in the User&#39;s queue CF=E2=80=A6 Is that a go=
od way to do ? Or should I stick with the first implementation ?</div>

<div><br></div><div>Best regards,</div><div><br></div><div>Morgan.</div><di=
v><br><div><div>Le 30 avr. 2012 =C3=A0 05:52, aaron morton a =C3=A9crit :</=
div><div><div><br><blockquote type=3D"cite"><div style=3D"word-wrap:break-w=
ord">
Message Queue is often not a great use case for Cassandra. For information =
on how=C2=A0to handle high delete workloads see=C2=A0<a href=3D"http://www.=
datastax.com/dev/blog/leveled-compaction-in-apache-cassandra" target=3D"_bl=
ank">http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandr=
a</a><div>

<br></div><div>It hard to create a model without some idea of the data load=
, but I would suggest you start with:</div><div><br></div><div>CF: UserMess=
ages</div><div>Key: ReceiverID</div><div>Columns : column name =3D TimeUUID=
 ; column value =3D message ID and Body</div>

<div><br></div><div>That will order the messages by time.=C2=A0</div><div><=
br></div><div>Depending on load (and to support deleting a previous months =
messages) you may want to partition the rows by month:</div><div><br></div>
<div>
<div>CF: UserMessagesMonth</div><div>Key: ReceiverID+YYYYMM</div><div>Colum=
ns : column name =3D TimeUUID ; column value =3D message ID and Body</div><=
/div><div><br></div><div>Everything the same as before. But now a user has =
a row for each month and which you can delete as a whole. This also helps a=
void very big rows.=C2=A0</div>

<div><br></div><div><blockquote type=3D"cite"><div>I really don&#39;t think=
 that storage will be an issue, I have 2TB per nodes, messages are 1KB limi=
ted.<br></div></blockquote>I would suggest you keep the per node limit to 3=
00 to 400 GB. It can take a long time to compact, repair and move the data =
when it gets above 400GB.=C2=A0</div>

<div><br></div><div>Hope that helps.=C2=A0</div><div><div><div>
</div>
<br><div>
<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white=
-space:normal;word-spacing:0px;font-size:medium"><span style=3D"border-coll=
apse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;f=
ont-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;=
text-transform:none;white-space:normal;word-spacing:0px;font-size:medium"><=
div style=3D"word-wrap:break-word">

<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-indent:0px;text-transform:none;white-space:normal;word-spaci=
ng:0px;font-size:medium"><div style=3D"word-wrap:break-word">

<span style=3D"border-collapse:separate;font-family:Helvetica;font-style:no=
rmal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-heig=
ht:normal;text-indent:0px;text-transform:none;white-space:normal;word-spaci=
ng:0px;font-size:medium"><div style=3D"word-wrap:break-word">

<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance Deve=
loper</div><div>@aaronmorton</div><div><a href=3D"http://www.thelastpickle.=
com/" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div>
</span></div>
</span></div></span></span>
</div>
<br><div><div>On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:</div><br><bl=
ockquote type=3D"cite"><div>Hi everyone !<br><br>I&#39;m fairly new to cass=
andra and I&#39;m not quite yet familiarized with column oriented NoSQL mod=
el.<br>

I have worked a while on it, but I can&#39;t seems to find the best model f=
or what I&#39;m looking for.<br><br>I have a Erlang software that let user =
connecting and communicate with each others, when an user (A) sends<br>

a message to a disconnected user (B), it stores it on the database and wait=
 for the user (B) to connect and retrieve<br>the message queue, and deletes=
 it. <br><br>Here&#39;s some key point : <br>- Users are identified by inte=
ger IDs<br>

- Each message are unique by combination of : Sender ID - Receiver ID - Mes=
sage ID - time<br><br>I have a queue Message, and here&#39;s the operations=
 I would need to do as fast as possible : <br><br>- Store from 1 to X messa=
ges per registered user<br>

- Get the number of stored messages per user (Can be a incremental variable=
 updated at each store // this is often retrieved)<br>- retrieve all messag=
es from an user at once.<br>- delete all messages from an user at once.<br>

- delete all messages that are older than Y months (from all users).<br><br=
>I really don&#39;t think that storage will be an issue, I have 2TB per nod=
es, messages are 1KB limited.<br>I&#39;m really looking for speed rather th=
an storage optimization.<br>

<br>My configuration is 2 dedicated server which are both :<br>- 4 x Intel =
i7 2.66 Ghz<br>- 64 bits<br>- 24 Go<br>- 2 TB<br><br>Thank you all.</div></=
blockquote></div><br></div></div></div></blockquote></div></div></div>
<br>
</div></div></blockquote></div><br>
</blockquote></div></div></div><br></div></div></blockquote></div><br>

--f46d0442811e57fb7f04bee497c9--