Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9E3854AA9 for ; Wed, 25 May 2011 14:58:23 +0000 (UTC) Received: (qmail 56107 invoked by uid 500); 25 May 2011 14:58:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 56081 invoked by uid 500); 25 May 2011 14:58:21 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56073 invoked by uid 99); 25 May 2011 14:58:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2011 14:58:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of openvictor@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 May 2011 14:58:16 +0000 Received: by iyn15 with SMTP id 15so8312419iyn.31 for ; Wed, 25 May 2011 07:57:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=6Nm3BDfYB7XzYCdlYMtahhYs4cTVEhvhM4EwUPBFbf4=; b=Jv1RFrL71XI1p0cnY204Fodi2sL1Pgqt3srWCY2Eh+ySW2EDvPE3JdzvKzIvBCwFpj RkkRAjenrQNGeyXyGXnzErkZIx5RI+EJ9ywCH24gCy/c06QO38bR9mkm6GjHGsQ4MqUB 8DVBRmFxEzRKR3IpMsnGgxLQewKQrjMZ2ODm0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=c710Nlb14f1iceNG+rqj2DbpK8oWUmov1bijKMb0JE0Ffu4ptiehcm6ThufslJN1IX /bSK6VmmRyFwyz48ycdfiknsNS6yq0IEnt3GEk5sUY86EpIECGWNoSheaz0GAiqBeyzI s5ORl8/U2rBSSBN8X6pLrJT/4QArc84JqeOs0= MIME-Version: 1.0 Received: by 10.231.207.71 with SMTP id fx7mr4386367ibb.168.1306335474931; Wed, 25 May 2011 07:57:54 -0700 (PDT) Received: by 10.231.40.13 with HTTP; Wed, 25 May 2011 07:57:54 -0700 (PDT) In-Reply-To: <906623C8-0BBD-4BA8-9735-18803E4AEA0E@thelastpickle.com> References: <906623C8-0BBD-4BA8-9735-18803E4AEA0E@thelastpickle.com> Date: Wed, 25 May 2011 10:57:54 -0400 Message-ID: Subject: Re: Recommandation on how to organize CF From: openvictor Open To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba53a4a8f0d87004a41aed91 --90e6ba53a4a8f0d87004a41aed91 Content-Type: text/plain; charset=ISO-8859-1 Thanks Aaron, Sorry I didn't see your message sooner. So the CF Messages using UTF8Type holds the information such as : who has the right to read/ is it possible to answer to this list etc... There are two "kinds" of keys. The keys which begin by : "message:uuid" and the "messagelist:uuid". A column of message:uuid is for example "sender" or "rawtext". A column of messagelist:uuid is for example : "creator" or "participants". MessagesTime (message_time) is the sorting mechanism, meaning when I request against message_time I get messages or messagelists in the order it was sent. There are 2 kinds of keys : "messagebox:someone" : Each Column is for the Value : the uuid of a list inside the messagebox of someone, for the Name : the uuid of the last message in the corresponding messagelist. It gives me a sorting mechanism based on the last message received. "messagelist:uuid" : Each Column has for its Name : the UUID of a message and for the Value : whatever it doesn't really care. About your suggestion, is a very good solution but there is one thing I don't really like with serialization : it "blocks" evolution. Let's say I would like to add one field to a message because I want to add a field, I am obliged to make a tool to deserialize, add the information reserialize all the fields and insert. Even if I serialize with JSON it looks like evolution (that is why I chose Cassandra) is a little bit broken.If I am wrong, please tell me so. However I will explore this very interesting possibility for another project with "tags", which is not really subject to dramatic evolutions. At the moment I don't really complain about speed and since it is not really time critical (after all who cares if the messagebox loads in 250 ms instead of 200ms). At the moment I get the messages with two batch Cassandra calls so I think this is satisfying. Thanks again, the json serialization looks like a very interesting possibility. Victor 2011/5/19 aaron morton > I'm a bit confused by your examples. I think you are saying... > > - Standard CF called Message using the UTF8Type for column comparisons used > to store the individual messages. Row key is the message UUID. Not sure what > the columns are. > - Standard CF called MessageTime using TimeUUIDType for columns comparison > uses to store collections of messages. Row key is > "messagelist:" for a message list, and > "messagebox::" for message box. Not sure what the > columns are. > > The best model is going to be the one that supports your read requests and > the volume of data your are expecting. > > One way to go is to de normalise to support very fast read paths. You could > store the entire message in one column using something like JSON to > serialise it. Then > > - MessageIndexes standard CF to store the full messages in context, there > are three different types of rows: > * keys with store all messages for a user, column name > is the message TimeUUID and value is the message structure > * keys with / store the messages for a single > message box. Columns same as below. > * keys with // store the messages > in a single message list. Columns as above. > > - MessageFolders CF to store the message box and message lists, two > approaches: > 1) as key and each column is a message box, message > lists are stored in a single column as JSON > 2) row for the top level message box, column for each > message box. / for the next level, > > Or if space is a concern just store the UUID of the message in the index CF > and add a CF to store the messages. > > It also going to depend on the management features, e.g. can you rename a > message box / list ? Move messages around ? If so the de normalised pattern > may not be the best as those operations will take longer. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 19 May 2011, at 05:44, openvictor Open wrote: > > > Hello all, > > > > I know organization is a broad topic and everybody may have an idea on > how to do it, but I really want to have some advices and opinions and I > think it could be interesting to discuss this matter. > > > > Here is my problem: I am designing a messaging system internal to a > website. There are 3 big structures which are Message, MessageList, > MessageBox. A message/messagelist is identified only by an UUID; a > MessageBox is identified by a name(utf8 string). A messagebox has a set of > MessageList in it and a messagelist has a set of message in it, all of them > being UUIDs. > > Currently I have only two CF : message and message_time. Message is a > UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a > TimeUUIDType. > > > > For example if I want to request all message in a certain messagelist I > do : message_time['messagelist:uuid(messagelist)'] > > If I want information of a mesasge I do message['message:uuid(message)'] > > If I want all messagelist for a certain messagebox ( called nameofbox for > user openvictor for this example) I do : > message_time['messagebox:openvictor:nameofbox'] > > > > My question to Cassandra users is : is it a good idea to regroup all > those things into two CF ? Is there some advantages / drawbacks of this two > CFs and for long term should I change my organization ? > > > > Thank you, > > Victor > > --90e6ba53a4a8f0d87004a41aed91 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Aaron,

Sorry I didn't see your message sooner.

So = the CF Messages using UTF8Type holds the=A0 information such as : who has t= he right to read/ is it possible to answer to this list etc... There are tw= o "kinds" of keys. The keys which begin by : "message:uuid&q= uot; and the "messagelist:uuid". A column of message:uuid is for = example "sender" or "rawtext". A column of messagelist:= uuid is for example : "creator" or "participants".


MessagesTime (message_time) is the sorting mechanism, meaning when = I request against message_time I get messages or messagelists in the order = it was sent. There are 2 kinds of keys :
"messagebox:someone" = : Each Column is for the Value : the uuid of a list inside the messagebox o= f someone, for the Name : the uuid of the last message in the corresponding= messagelist. It gives me a sorting mechanism based on the last message rec= eived.
"messagelist:uuid" : Each Column has for its Name : the UUID of a= message and for the Value : whatever it doesn't really care.

Ab= out your suggestion, is a very good solution but there is one thing I don&#= 39;t really like with serialization : it "blocks" evolution. Let&= #39;s say I would like to add one field to a message because I want to add = a field, I am obliged to make a tool to deserialize, add the information=A0= reserialize all the fields and insert. Even if I serialize with JSON it lo= oks like evolution (that is why I chose Cassandra) is a little bit broken.I= f I am wrong, please tell me so.
However I will explore this very interesting possibility for another projec= t with "tags", which is not really subject to dramatic evolutions= .

At the moment I don't really complain about speed and since it= is not really time critical (after all who cares if the messagebox loads i= n 250 ms instead of 200ms). At the moment I get the messages with two batch= Cassandra calls so I think this is satisfying.

Thanks again, the json serialization looks like a very interesting poss= ibility.

Victor

2011/5/19 aaron mo= rton <aaron= @thelastpickle.com>
I'm a bit confused by your examples. I = think you are saying...

- Standard CF called Message using the UTF8Type for column comparisons used= to store the individual messages. Row key is the message UUID. Not sure wh= at the columns are.
- Standard CF called MessageTime using TimeUUIDType for columns comparison = uses to store collections of messages. Row key is "messagelist:<mes= sage_list_uuid>" for a message list, and "messagebox:<user_= name>:<mbox_name>" for message box. Not sure what the columns= are.

The best model is going to be the one that supports your read requests and = the volume of data your are expecting.

One way to go is to de normalise to support very fast read paths. You could= store the entire message in one column using something like JSON to serial= ise it. Then

- MessageIndexes standard CF to store the full messages in context, there a= re three different types of rows:
=A0 =A0 =A0 =A0* keys with <user_name> =A0store all messages for a u= ser, column name is the message TimeUUID and value is the message structure=
=A0 =A0 =A0 =A0* keys with <user_name>/<mbox_name> store the m= essages for a single message box. Columns same as below.
=A0 =A0 =A0 =A0* keys with <user_name>/<mbox_name>/<mlist_n= ame> store the messages in a single message list. Columns as above.

- MessageFolders CF to store the message box and message lists, two approac= hes:
=A0 =A0 =A0 =A01) <user_name> as key and each column is a message bo= x, message lists are stored in a single column as JSON
=A0 =A0 =A0 =A02) <user_name> row for the top level message box, col= umn for each message box. <user_name>/<message_box> for the nex= t level,

Or if space is a concern just store the UUID of the message in the index CF= and add a CF to store the messages.

It also going to depend on the management features, e.g. can you rename a m= essage box / list ? Move messages around ? If so the de normalised pattern = may not be the best as those operations will take longer.

Hope that helps.

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thela= stpickle.com

On 19 May 2011, at 05:44, openvictor Open wrote:

> Hello all,
>
> I know organization is a broad topic and everybody may have an idea on= how to do it, but I really want to have some advices and opinions and I th= ink it could be interesting to discuss this matter.
>
> Here is my problem: I am designing a messaging system internal to a we= bsite. There are 3 big structures which are Message, MessageList, MessageBo= x. A message/messagelist is identified only by an UUID; a MessageBox is ide= ntified by a name(utf8 string). A messagebox has a set of MessageList in it= and a messagelist has a set of message in it, all of them being UUIDs.
> Currently I have only two CF : message and message_time. Message is a = UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a TimeU= UIDType.
>
> For example if I want to request all message in a certain messagelist = I do : message_time['messagelist:uuid(messagelist)']
> If I want information of a mesasge I do message['message:uuid(mess= age)']
> If I want all messagelist for a certain messagebox ( called nameofbox = for user openvictor for this example) I do : message_time['messagebox:o= penvictor:nameofbox']
>
> My question to Cassandra users is : is it a good idea to regroup all t= hose things into two CF ? Is there some advantages / drawbacks of this two = CFs and for long term should I change my organization ?
>
> Thank you,
> Victor


--90e6ba53a4a8f0d87004a41aed91--