Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of openvictor@gmail.com
 designates 209.85.210.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=c710Nlb14f1iceNG+rqj2DbpK8oWUmov1bijKMb0JE0Ffu4ptiehcm6ThufslJN1IX
         /bSK6VmmRyFwyz48ycdfiknsNS6yq0IEnt3GEk5sUY86EpIECGWNoSheaz0GAiqBeyzI
         s5ORl8/U2rBSSBN8X6pLrJT/4QArc84JqeOs0=
MIME-Version: 1.0
In-Reply-To: <906623C8-0BBD-4BA8-9735-18803E4AEA0E@thelastpickle.com>
References: <BANLkTimt1etUapzkzj=paSbg9_nTPThh5w@mail.gmail.com>
	<906623C8-0BBD-4BA8-9735-18803E4AEA0E@thelastpickle.com>
Date: Wed, 25 May 2011 10:57:54 -0400
Message-ID: <BANLkTiksXhYPJOSvJRfKkQ5JwnbAciZwiQ@mail.gmail.com>
Subject: Re: Recommandation on how to organize CF
From: openvictor Open <openvictor@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=90e6ba53a4a8f0d87004a41aed91

--90e6ba53a4a8f0d87004a41aed91
Content-Type: text/plain; charset=ISO-8859-1

Thanks Aaron,

Sorry I didn't see your message sooner.

So the CF Messages using UTF8Type holds the  information such as : who has
the right to read/ is it possible to answer to this list etc... There are
two "kinds" of keys. The keys which begin by : "message:uuid" and the
"messagelist:uuid". A column of message:uuid is for example "sender" or
"rawtext". A column of messagelist:uuid is for example : "creator" or
"participants".


MessagesTime (message_time) is the sorting mechanism, meaning when I request
against message_time I get messages or messagelists in the order it was
sent. There are 2 kinds of keys :
"messagebox:someone" : Each Column is for the Value : the uuid of a list
inside the messagebox of someone, for the Name : the uuid of the last
message in the corresponding messagelist. It gives me a sorting mechanism
based on the last message received.
"messagelist:uuid" : Each Column has for its Name : the UUID of a message
and for the Value : whatever it doesn't really care.

About your suggestion, is a very good solution but there is one thing I
don't really like with serialization : it "blocks" evolution. Let's say I
would like to add one field to a message because I want to add a field, I am
obliged to make a tool to deserialize, add the information  reserialize all
the fields and insert. Even if I serialize with JSON it looks like evolution
(that is why I chose Cassandra) is a little bit broken.If I am wrong, please
tell me so.
However I will explore this very interesting possibility for another project
with "tags", which is not really subject to dramatic evolutions.

At the moment I don't really complain about speed and since it is not really
time critical (after all who cares if the messagebox loads in 250 ms instead
of 200ms). At the moment I get the messages with two batch Cassandra calls
so I think this is satisfying.

Thanks again, the json serialization looks like a very interesting
possibility.

Victor

2011/5/19 aaron morton <aaron@thelastpickle.com>

> I'm a bit confused by your examples. I think you are saying...
>
> - Standard CF called Message using the UTF8Type for column comparisons used
> to store the individual messages. Row key is the message UUID. Not sure what
> the columns are.
> - Standard CF called MessageTime using TimeUUIDType for columns comparison
> uses to store collections of messages. Row key is
> "messagelist:<message_list_uuid>" for a message list, and
> "messagebox:<user_name>:<mbox_name>" for message box. Not sure what the
> columns are.
>
> The best model is going to be the one that supports your read requests and
> the volume of data your are expecting.
>
> One way to go is to de normalise to support very fast read paths. You could
> store the entire message in one column using something like JSON to
> serialise it. Then
>
> - MessageIndexes standard CF to store the full messages in context, there
> are three different types of rows:
>        * keys with <user_name>  store all messages for a user, column name
> is the message TimeUUID and value is the message structure
>        * keys with <user_name>/<mbox_name> store the messages for a single
> message box. Columns same as below.
>        * keys with <user_name>/<mbox_name>/<mlist_name> store the messages
> in a single message list. Columns as above.
>
> - MessageFolders CF to store the message box and message lists, two
> approaches:
>        1) <user_name> as key and each column is a message box, message
> lists are stored in a single column as JSON
>        2) <user_name> row for the top level message box, column for each
> message box. <user_name>/<message_box> for the next level,
>
> Or if space is a concern just store the UUID of the message in the index CF
> and add a CF to store the messages.
>
> It also going to depend on the management features, e.g. can you rename a
> message box / list ? Move messages around ? If so the de normalised pattern
> may not be the best as those operations will take longer.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19 May 2011, at 05:44, openvictor Open wrote:
>
> > Hello all,
> >
> > I know organization is a broad topic and everybody may have an idea on
> how to do it, but I really want to have some advices and opinions and I
> think it could be interesting to discuss this matter.
> >
> > Here is my problem: I am designing a messaging system internal to a
> website. There are 3 big structures which are Message, MessageList,
> MessageBox. A message/messagelist is identified only by an UUID; a
> MessageBox is identified by a name(utf8 string). A messagebox has a set of
> MessageList in it and a messagelist has a set of message in it, all of them
> being UUIDs.
> > Currently I have only two CF : message and message_time. Message is a
> UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a
> TimeUUIDType.
> >
> > For example if I want to request all message in a certain messagelist I
> do : message_time['messagelist:uuid(messagelist)']
> > If I want information of a mesasge I do message['message:uuid(message)']
> > If I want all messagelist for a certain messagebox ( called nameofbox for
> user openvictor for this example) I do :
> message_time['messagebox:openvictor:nameofbox']
> >
> > My question to Cassandra users is : is it a good idea to regroup all
> those things into two CF ? Is there some advantages / drawbacks of this two
> CFs and for long term should I change my organization ?
> >
> > Thank you,
> > Victor
>
>

--90e6ba53a4a8f0d87004a41aed91
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Aaron,<br><br>Sorry I didn&#39;t see your message sooner.<br><br>So =
the CF Messages using UTF8Type holds the=A0 information such as : who has t=
he right to read/ is it possible to answer to this list etc... There are tw=
o &quot;kinds&quot; of keys. The keys which begin by : &quot;message:uuid&q=
uot; and the &quot;messagelist:uuid&quot;. A column of message:uuid is for =
example &quot;sender&quot; or &quot;rawtext&quot;. A column of messagelist:=
uuid is for example : &quot;creator&quot; or &quot;participants&quot;.<br>
<br><br>MessagesTime (message_time) is the sorting mechanism, meaning when =
I request against message_time I get messages or messagelists in the order =
it was sent. There are 2 kinds of keys :<br>&quot;messagebox:someone&quot; =
: Each Column is for the Value : the uuid of a list inside the messagebox o=
f someone, for the Name : the uuid of the last message in the corresponding=
 messagelist. It gives me a sorting mechanism based on the last message rec=
eived.<br>
&quot;messagelist:uuid&quot; : Each Column has for its Name : the UUID of a=
 message and for the Value : whatever it doesn&#39;t really care.<br><br>Ab=
out your suggestion, is a very good solution but there is one thing I don&#=
39;t really like with serialization : it &quot;blocks&quot; evolution. Let&=
#39;s say I would like to add one field to a message because I want to add =
a field, I am obliged to make a tool to deserialize, add the information=A0=
 reserialize all the fields and insert. Even if I serialize with JSON it lo=
oks like evolution (that is why I chose Cassandra) is a little bit broken.I=
f I am wrong, please tell me so. <br>
However I will explore this very interesting possibility for another projec=
t with &quot;tags&quot;, which is not really subject to dramatic evolutions=
.<br><br>At the moment I don&#39;t really complain about speed and since it=
 is not really time critical (after all who cares if the messagebox loads i=
n 250 ms instead of 200ms). At the moment I get the messages with two batch=
 Cassandra calls so I think this is satisfying.<br>
<br>Thanks again, the json serialization looks like a very interesting poss=
ibility.<br><br>Victor<br><br><div class=3D"gmail_quote">2011/5/19 aaron mo=
rton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com">aaron=
@thelastpickle.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">I&#39;m a bit confused by your examples. I =
think you are saying...<br>
<br>
- Standard CF called Message using the UTF8Type for column comparisons used=
 to store the individual messages. Row key is the message UUID. Not sure wh=
at the columns are.<br>
- Standard CF called MessageTime using TimeUUIDType for columns comparison =
uses to store collections of messages. Row key is &quot;messagelist:&lt;mes=
sage_list_uuid&gt;&quot; for a message list, and &quot;messagebox:&lt;user_=
name&gt;:&lt;mbox_name&gt;&quot; for message box. Not sure what the columns=
 are.<br>

<br>
The best model is going to be the one that supports your read requests and =
the volume of data your are expecting.<br>
<br>
One way to go is to de normalise to support very fast read paths. You could=
 store the entire message in one column using something like JSON to serial=
ise it. Then<br>
<br>
- MessageIndexes standard CF to store the full messages in context, there a=
re three different types of rows:<br>
 =A0 =A0 =A0 =A0* keys with &lt;user_name&gt; =A0store all messages for a u=
ser, column name is the message TimeUUID and value is the message structure=
<br>
 =A0 =A0 =A0 =A0* keys with &lt;user_name&gt;/&lt;mbox_name&gt; store the m=
essages for a single message box. Columns same as below.<br>
 =A0 =A0 =A0 =A0* keys with &lt;user_name&gt;/&lt;mbox_name&gt;/&lt;mlist_n=
ame&gt; store the messages in a single message list. Columns as above.<br>
<br>
- MessageFolders CF to store the message box and message lists, two approac=
hes:<br>
 =A0 =A0 =A0 =A01) &lt;user_name&gt; as key and each column is a message bo=
x, message lists are stored in a single column as JSON<br>
 =A0 =A0 =A0 =A02) &lt;user_name&gt; row for the top level message box, col=
umn for each message box. &lt;user_name&gt;/&lt;message_box&gt; for the nex=
t level,<br>
<br>
Or if space is a concern just store the UUID of the message in the index CF=
 and add a CF to store the messages.<br>
<br>
It also going to depend on the management features, e.g. can you rename a m=
essage box / list ? Move messages around ? If so the de normalised pattern =
may not be the best as those operations will take longer.<br>
<br>
Hope that helps.<br>
<br>
-----------------<br>
<font color=3D"#888888">Aaron Morton<br>
Freelance Cassandra Developer<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thela=
stpickle.com</a><br>
</font><div><div></div><div class=3D"h5"><br>
On 19 May 2011, at 05:44, openvictor Open wrote:<br>
<br>
&gt; Hello all,<br>
&gt;<br>
&gt; I know organization is a broad topic and everybody may have an idea on=
 how to do it, but I really want to have some advices and opinions and I th=
ink it could be interesting to discuss this matter.<br>
&gt;<br>
&gt; Here is my problem: I am designing a messaging system internal to a we=
bsite. There are 3 big structures which are Message, MessageList, MessageBo=
x. A message/messagelist is identified only by an UUID; a MessageBox is ide=
ntified by a name(utf8 string). A messagebox has a set of MessageList in it=
 and a messagelist has a set of message in it, all of them being UUIDs.<br>

&gt; Currently I have only two CF : message and message_time. Message is a =
UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a TimeU=
UIDType.<br>
&gt;<br>
&gt; For example if I want to request all message in a certain messagelist =
I do : message_time[&#39;messagelist:uuid(messagelist)&#39;]<br>
&gt; If I want information of a mesasge I do message[&#39;message:uuid(mess=
age)&#39;]<br>
&gt; If I want all messagelist for a certain messagebox ( called nameofbox =
for user openvictor for this example) I do : message_time[&#39;messagebox:o=
penvictor:nameofbox&#39;]<br>
&gt;<br>
&gt; My question to Cassandra users is : is it a good idea to regroup all t=
hose things into two CF ? Is there some advantages / drawbacks of this two =
CFs and for long term should I change my organization ?<br>
&gt;<br>
&gt; Thank you,<br>
&gt; Victor<br>
<br>
</div></div></blockquote></div><br>

--90e6ba53a4a8f0d87004a41aed91--