Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 40801 invoked from network); 21 Sep 2010 14:28:01 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Sep 2010 14:28:01 -0000 Received: (qmail 8274 invoked by uid 500); 21 Sep 2010 14:27:59 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8085 invoked by uid 500); 21 Sep 2010 14:27:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 7687 invoked by uid 99); 21 Sep 2010 14:27:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 14:27:55 +0000 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_FILL_THIS_FORM_SHORT,T_TO_NO_BRKTS_FREEMAIL,URIBL_SBL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of juho.makinen@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Sep 2010 14:27:51 +0000 Received: by qwc9 with SMTP id 9so4924070qwc.31 for ; Tue, 21 Sep 2010 07:27:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=cBp1D8WV9uY1143eISSkQzWaHaO9JlVCtndcE36mDLQ=; b=TToSG3+DhMP2t8vl9e6tK+BZREVhZ9TZaW6GSXQ+zuKCLzwOapSQv4tvwtZbvjahTn uK9xmQ8KVyLMQn3UnTrrlUzqWT1RVptA0bItUf8wLxOhxbrQ+C5sFaAsqbHnbOYHkCyB 0D59/WQh1h+JcTDhhdVOU4lRKOi2UlOx2PMfE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=PccvOiW/Dx1ZFW2kivknqZ7yVcgMC8H4rVcaZw4su6fDxEQCZkkzqE6NZwwLj+DbYG zjzrSFjTC8KOVflRJJiMX46+1HEcGLoyyQ4WgCBebqdU13TyeQ6zk467Xhoy06v3J2iT 2J6awLxiY4LlCGc/SvIeNUf5//xItUsEQcnes= MIME-Version: 1.0 Received: by 10.224.19.200 with SMTP id c8mr7092198qab.202.1285079250353; Tue, 21 Sep 2010 07:27:30 -0700 (PDT) Received: by 10.229.59.103 with HTTP; Tue, 21 Sep 2010 07:27:30 -0700 (PDT) In-Reply-To: <4C98BACF.6030609@monit.dk> References: <4C973026.8020203@monit.dk> <4C98BACF.6030609@monit.dk> Date: Tue, 21 Sep 2010 17:27:30 +0300 Message-ID: Subject: Re: Schema question From: =?ISO-8859-1?Q?Juho_M=E4kinen?= To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Sep 21, 2010 at 5:01 PM, Morten Wegelbye Nissen wrot= e: > There is a point here that is very important. The key, is erhhm the key t= o > success. Ie. you must build the key in a way where you can find it again. Yes. You must index your data (choose your key and column names) in such that it servers you in the way you want to retrieve the data. In some cases it means duplicating some data into multiple places but usually that's not a problem. You can be creative by using JSON to store a lot of columns in your payload and add it's also possible to add them dynamically later. > In case you create a system for login, you would most likely have the log= in > name as key. ( And maybe here link that to a userid that will be used for > keys later on ) Yes, you could have a CF which has user id (integer, uuid or similar) as the key and the values would host user data like email, name, age, hashed password etc and then another CF which has email as the key and a single column which contains the user uid. This allows you to fetch the user by email on the login and still allows you to change the email later. > Is it correctly understand that instead of having this magic formatted js= on > text in a column, a supercolumn could have serve you? Yes but that wouldn't benefit us so I just choosed to use simple CF with JSON as column payload as they're easier to handle. Also check my post from today "Cassandra operation success ratio survey results". - Juho M=E4kinen > > ./Morten > > On 20-09-2010 12:37, Juho M=E4kinen wrote: >> >> We have built a facebook style "messenger" into our web site which >> uses cassandra as storage backend with two column families: >> TalkMessages and TalkLastMessages. I've uploaded a screenshot showing >> the feature in action to >> http://img138.imageshack.us/img138/3807/talkexample.jpg >> >> TalkMessages contains each message between two participants. The key >> is a string built from the two users uids "$smaller_uid:$bigger_uid". >> Each column inside this CF contains a single message. The column name >> is the message timestamp in microseconds since epoch stored as >> LongType. The column value is a JSON encoded string containing >> following fields: sender_uid, target_uid, msg. >> >> This results in following structure inside the column family. >> >> "2249:9111" =3D> =A0[ >> =A0 12345678 : { sender_uid : 2249, target_uid : 9111, msg : "Hello, how >> are you?" }, >> =A0 12345679 : { sender_uid : 9111, target_uid : 2249, msg : "I'm fine, >> thanks" } >> ] >> >> TalkLastMessages is used to quickly fetch users talk partners, the >> last message which was sent between the peers and other similar data. >> This allows us to quickly fetch all needed data which is needed to >> display a "main view" for all online friends with just one query to >> cassandra. This column family uses the user uid as is key. Each column >> represents a talk partner whom the user has been talking to and it >> uses the talk partner uid as the column name. Column value is a json >> packed structure which contains following fields: >> =A0- last message timestamp: microseconds since epoch when a message was >> last sent between these two users. >> =A0- unread timestamp : microseconds since epoch when the first unread >> message was sent between these two users. >> =A0- unread : counter how many unread messages there are. >> =A0- last message : last message between these two users. >> >> This results in following structure inside the column family for these >> two example users: 2249 and 9111. >> >> "2249" =3D> =A0[ >> =A0 9111 : { last_message_timestamp : 12345679, unread_timestamp : >> 12345679, unread : 1, last_message: "I'm fine, thanks" } >> >> ], >> "9111" =3D> =A0[ >> =A0 2249 : { last_message_timestamp : =A012345679, unread_timestamp : >> 12345679, unread : 0, last_message: "I'm fine, thanks" } >> ] >> >> Displaying chat (this happends on every page load, needs to be fast) >> =A01) Fetch all columns from TalkLastMessages for the user >> >> Display messages history between two participants: >> =A01) Fetch last n columns from TalkMessages for the relevant >> "$smaller_uid:$bigger_uid" row. >> >> Mark all sent messages from another participant as read (when you read >> the messages) >> =A01) Get column $sender_uid from row $reader_uid from TalkLastMessages >> =A02) Update the JSON payload and insert the column back >> >> Sending message involves the following operations: >> =A01) Insert new column to TalkMessages >> =A02) Fetch relevant column from TalkLastMessages from $target_uid row >> with $sender_uid column >> =A03) Update the column json payload and insert it back to TalkLastMessa= ges >> =A04) Fetch relevant column from TalkLastMessages from $sender_uid row >> with $target_uid column >> =A05) Update the column json payload and insert it back to TalkLastMessa= ges >> >> There are also other operations and the actual payload is a bit more >> complex. >> >> I'm happy to answer questions if somebody is interested :) >> >> =A0- Juho M=E4kinen >> >> >> >> On Mon, Sep 20, 2010 at 12:57 PM, Morten Wegelbye Nissen >> =A0wrote: >>> >>> =A0Hello List, >>> >>> No matter where you read, you almost every-where read the the noSQL >>> datascema is completely different from the relational way - and after a >>> little insight in cassandra everyone can 2nd that. >>> >>> But I miss to see some real-life examples on how a real system can be >>> modelled. Lets take the example for a system where users can send >>> messages >>> to each other. ( Completely imaginary, noone would use cassandra for a >>> mailsystem :) ) >>> >>> If one should create such a system, what CF's would be used? And how >>> would >>> you per example find all not read messages? >>> >>> ./Morten >>> > >