From user-return-12745-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Feb 02 15:49:39 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 36485 invoked from network); 2 Feb 2011 15:49:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Feb 2011 15:49:39 -0000 Received: (qmail 33308 invoked by uid 500); 2 Feb 2011 15:49:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33070 invoked by uid 500); 2 Feb 2011 15:49:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32991 invoked by uid 99); 2 Feb 2011 15:49:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Feb 2011 15:49:33 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bill.speirs@gmail.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vx0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Feb 2011 15:49:24 +0000 Received: by vxi40 with SMTP id 40so20302vxi.31 for ; Wed, 02 Feb 2011 07:49:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=W+7oZ9bI9SSpgvXlhF9WWi7RmE1HTs6ZdkkkBqbF2rE=; b=GIp6sJ7qF8g3hH06quzpveRGuGz34e6jNl0FHnW+ZwrgFwLX33m/DPi03wzTkm7h/d UDOCO5GyaQ+B/SM3NYabqZhd0Y9kr+aXxJzchjp+Ir/kkT3euCpdXFX8CWM7eMHzqcVA WS3u3vZ9UOoH0YRyHuq02Ypx9MpeajVe0RMQ8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=IKpjHgXRB/fuRqMuOVIKL7Qe3dgBk1mAFs6F7TEd66NRJREk6QTqD/KpWdXiE3cI0N Qe4GvyzKEneAb7rJSeCQb3Qg6IQSAH7StA39E4MeJrIm15FP62zV8Y90CCBQ8fgHGsU1 z6bOlKrL2bsH8/Bj4i6/CZuRNXxx2exorK8zI= Received: by 10.220.191.73 with SMTP id dl9mr2463865vcb.47.1296661743155; Wed, 02 Feb 2011 07:49:03 -0800 (PST) Received: from [192.168.1.100] (c-71-235-103-23.hsd1.ct.comcast.net [71.235.103.23]) by mx.google.com with ESMTPS id fl9sm14997511vbb.10.2011.02.02.07.49.01 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 02 Feb 2011 07:49:02 -0800 (PST) Message-ID: <4D497CED.9080400@gmail.com> Date: Wed, 02 Feb 2011 10:49:01 -0500 From: William R Speirs User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data? References: <4D497720.3090800@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Any time I see/hear "a single row containing all ..." I get nervous. That single row is going to reside on a single node. That is potentially a lot of load (don't know the system) for that single node. Why wouldn't you split it by at least user? If it won't be a lot of load, then why are you using Cassandra? This seems like something that could easily fit into an SQL/relational style DB. If it's too much data (millions of users, 100s of millions of reminders) for a standard SQL/relational model, then it's probably too much for a single row. I'm not familiar with the TTL functionality of Cassandra... sorry cannot help/comment there, still learning :-) Yea, my $0.02 is that this is an effective way to leverage super columns. Bill- On 02/02/2011 10:43 AM, Aditya Narayan wrote: > I think you got it exactly what I wanted to convey except for few > things I want to clarify: > > I was thinking of a single row containing all reminders (& not split > by day). History of the reminders need to be maintained for some time. > After certain time (say 3 or 6 months) they may be deleted by ttl > facility. > > "While presenting the reminders timeline to the user, latest > supercolumns like around 50 from the start_end will be picked up and > their subcolumns values will be compared to the Tags user has chosen > to see and, corresponding to the filtered subcolumn values(tags), the > rows of the reminder details would be picked up.." > > Is supercolumn a preferable choice for this ? Can there be a better > schema than this ? > > > -Aditya Narayan > > > > On Wed, Feb 2, 2011 at 8:54 PM, William R Speirs wrote: >> To reiterate, so I know we're both on the same page, your schema would be >> something like this: >> >> - A column family (as you describe) to store the details of a reminder. One >> reminder per row. The row key would be a TimeUUID. >> >> - A super column family to store the reminders for each user, for each day. >> The row key would be something like: YYYYMMDD:user_id. The column names >> would simply be the TimeUUID of the messages. The sub column names would be >> the tag names of the various reminders. >> >> The idea is that you would then get a slice of each row for a user, for a >> day, that would only contain sub column names with the tags you're looking >> for? Then based upon the column names returned, you'd look-up the reminders. >> >> That seems like a solid schema to me. >> >> Bill- >> >> On 02/02/2011 09:37 AM, Aditya Narayan wrote: >>> >>> Actually, I am trying to use Cassandra to display to users on my >>> applicaiton, the list of all Reminders set by themselves for >>> themselves, on the application. >>> >>> I need to store rows containing the timeline of daily Reminders put by >>> the users, for themselves, on application. The reminders need to be >>> presented to the user in a chronological order like a news feed. >>> Each reminder has got certain tags associated with it(so that, at >>> times, user may also choose to see the reminders filtered by tags in >>> chronological order). >>> >>> So I thought of a schema something like this:- >>> >>> -Each Reminder details may be stored as separate rows in column family. >>> -For presenting the timeline of reminders set by user to be presented >>> to the user, the timeline row of each user would contain the Id/Key(s) >>> (of the Reminder rows) as the supercolumn names and the subcolumns >>> inside that supercolumns could contain the list of tags associated >>> with particular reminder. All tags set at once during first write. The >>> no of tags(subcolumns) will be around 8 maximum. >>> >>> Any comments, suggestions and feedback on the schema design are >>> requested.. >>> >>> Thanks >>> Aditya Narayan >>> >>> >>> On Wed, Feb 2, 2011 at 7:49 PM, Aditya Narayan wrote: >>>> >>>> Hey all, >>>> >>>> I need to store supercolumns each with around 8 subcolumns; >>>> All the data for a supercolumn is written at once and all subcolumns >>>> need to be retrieved together. The data in each subcolumn is not big, >>>> it just contains keys to other rows. >>>> >>>> Would it be preferred to have a supercolumn family or just a standard >>>> column family containing "all the subcolumns data serialized in single >>>> column(s) " ? >>>> >>>> Thanks >>>> Aditya Narayan >>>> >>