incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From openvictor Open <openvic...@gmail.com>
Subject Re: Recommandation on how to organize CF
Date Wed, 25 May 2011 14:57:54 GMT
Thanks Aaron,

Sorry I didn't see your message sooner.

So the CF Messages using UTF8Type holds the  information such as : who has
the right to read/ is it possible to answer to this list etc... There are
two "kinds" of keys. The keys which begin by : "message:uuid" and the
"messagelist:uuid". A column of message:uuid is for example "sender" or
"rawtext". A column of messagelist:uuid is for example : "creator" or
"participants".


MessagesTime (message_time) is the sorting mechanism, meaning when I request
against message_time I get messages or messagelists in the order it was
sent. There are 2 kinds of keys :
"messagebox:someone" : Each Column is for the Value : the uuid of a list
inside the messagebox of someone, for the Name : the uuid of the last
message in the corresponding messagelist. It gives me a sorting mechanism
based on the last message received.
"messagelist:uuid" : Each Column has for its Name : the UUID of a message
and for the Value : whatever it doesn't really care.

About your suggestion, is a very good solution but there is one thing I
don't really like with serialization : it "blocks" evolution. Let's say I
would like to add one field to a message because I want to add a field, I am
obliged to make a tool to deserialize, add the information  reserialize all
the fields and insert. Even if I serialize with JSON it looks like evolution
(that is why I chose Cassandra) is a little bit broken.If I am wrong, please
tell me so.
However I will explore this very interesting possibility for another project
with "tags", which is not really subject to dramatic evolutions.

At the moment I don't really complain about speed and since it is not really
time critical (after all who cares if the messagebox loads in 250 ms instead
of 200ms). At the moment I get the messages with two batch Cassandra calls
so I think this is satisfying.

Thanks again, the json serialization looks like a very interesting
possibility.

Victor

2011/5/19 aaron morton <aaron@thelastpickle.com>

> I'm a bit confused by your examples. I think you are saying...
>
> - Standard CF called Message using the UTF8Type for column comparisons used
> to store the individual messages. Row key is the message UUID. Not sure what
> the columns are.
> - Standard CF called MessageTime using TimeUUIDType for columns comparison
> uses to store collections of messages. Row key is
> "messagelist:<message_list_uuid>" for a message list, and
> "messagebox:<user_name>:<mbox_name>" for message box. Not sure what the
> columns are.
>
> The best model is going to be the one that supports your read requests and
> the volume of data your are expecting.
>
> One way to go is to de normalise to support very fast read paths. You could
> store the entire message in one column using something like JSON to
> serialise it. Then
>
> - MessageIndexes standard CF to store the full messages in context, there
> are three different types of rows:
>        * keys with <user_name>  store all messages for a user, column name
> is the message TimeUUID and value is the message structure
>        * keys with <user_name>/<mbox_name> store the messages for a single
> message box. Columns same as below.
>        * keys with <user_name>/<mbox_name>/<mlist_name> store the messages
> in a single message list. Columns as above.
>
> - MessageFolders CF to store the message box and message lists, two
> approaches:
>        1) <user_name> as key and each column is a message box, message
> lists are stored in a single column as JSON
>        2) <user_name> row for the top level message box, column for each
> message box. <user_name>/<message_box> for the next level,
>
> Or if space is a concern just store the UUID of the message in the index CF
> and add a CF to store the messages.
>
> It also going to depend on the management features, e.g. can you rename a
> message box / list ? Move messages around ? If so the de normalised pattern
> may not be the best as those operations will take longer.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19 May 2011, at 05:44, openvictor Open wrote:
>
> > Hello all,
> >
> > I know organization is a broad topic and everybody may have an idea on
> how to do it, but I really want to have some advices and opinions and I
> think it could be interesting to discuss this matter.
> >
> > Here is my problem: I am designing a messaging system internal to a
> website. There are 3 big structures which are Message, MessageList,
> MessageBox. A message/messagelist is identified only by an UUID; a
> MessageBox is identified by a name(utf8 string). A messagebox has a set of
> MessageList in it and a messagelist has a set of message in it, all of them
> being UUIDs.
> > Currently I have only two CF : message and message_time. Message is a
> UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a
> TimeUUIDType.
> >
> > For example if I want to request all message in a certain messagelist I
> do : message_time['messagelist:uuid(messagelist)']
> > If I want information of a mesasge I do message['message:uuid(message)']
> > If I want all messagelist for a certain messagebox ( called nameofbox for
> user openvictor for this example) I do :
> message_time['messagebox:openvictor:nameofbox']
> >
> > My question to Cassandra users is : is it a good idea to regroup all
> those things into two CF ? Is there some advantages / drawbacks of this two
> CFs and for long term should I change my organization ?
> >
> > Thank you,
> > Victor
>
>

Mime
View raw message