Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 479A561B6 for ; Wed, 18 May 2011 18:42:59 +0000 (UTC) Received: (qmail 79098 invoked by uid 500); 18 May 2011 18:42:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 79070 invoked by uid 500); 18 May 2011 18:42:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 79061 invoked by uid 99); 18 May 2011 18:42:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 18:42:57 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rajkumar.w93@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 18:42:50 +0000 Received: by fxm15 with SMTP id 15so1631287fxm.31 for ; Wed, 18 May 2011 11:42:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=YO/E3JqydKD+Rya3K2qVEQaoDrp8matjkC3USaudBlc=; b=KhE7LzKLsIWJHfiieYq6tEEgy8LFwV6aF6FinZa62RLq0nZfjQ5sQECkYyntMNzhEm GJPNfTqxDZ974bponX6fOKa0+jjftxRW8V8B0vAnxiVrMPu3snJhLA5A/WvsU7fQc5tN IuHUYEx67bmEEZKgV4UpM88/BbVY76JzArAoo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=LtRXXYK9Xv687SoafW+kNIC6NR5YcSqzUBRyCQyhu3PcsXhm6JXB/oFfQM4uLlchrm PWJ/PGNKrngbOWjYE6oXf+iAyPmaGXgqDn4tsiWs7kxbBiUHD3tSDnf/1cE7g+ap/j3N Rj4+AZPcXxzhWSgTgn3b7PpSa9rHqE+8ZClJU= MIME-Version: 1.0 Received: by 10.223.6.11 with SMTP id 11mr2786556fax.100.1305744149939; Wed, 18 May 2011 11:42:29 -0700 (PDT) Sender: rajkumar.w93@gmail.com Received: by 10.223.122.71 with HTTP; Wed, 18 May 2011 11:42:29 -0700 (PDT) In-Reply-To: References: Date: Thu, 19 May 2011 00:12:29 +0530 X-Google-Sender-Auth: YInvW83N1NX8NwxkiNibJtR7t9Q Message-ID: Subject: Re: Design for 'Most viewed Discussions' in a forum From: Aditya Narayan To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00151747849e3984c104a3914089 X-Virus-Checked: Checked by ClamAV on apache.org --00151747849e3984c104a3914089 Content-Type: text/plain; charset=ISO-8859-1 Thanks victor! Aren't there any good ways by using Cassandra alone ? On Wed, May 18, 2011 at 11:41 PM, openvictor Open wrote: > Have you thought about user another kind of Database, which supports > volative content for example ? > > I am currently thinking about doing something similar. The best and > simplest option at the moment that I can think of is Redis. In redis you > have the option of querying keys with wildcards. Your problem can be done by > just inserting an UUID into Redis for a certain amount of time ( the best is > to tailor this amount of time as an inverse function of the number of keys > existing in Redis). > > *With Redis* > What I would do : I cut down time in pieces of X minutes ( 15 minutes, for > example by truncating a timestamp). Let timestampN be the timestamp for the > period of time ( [N,N+15] ), let Topic1 Topic2 be two topics then : > > One or more people will view Topic 1 then Topic2 then again Topic1 in this > period of 15 minutes > (HINCRBY is the Increment) > H INCRBY topics:Topic1:timestampN > viewcount 1 > H INCRBY topics:Topic2:timestampN > viewcount 1 > H INCRBY topics:Topic1:timestampN > viewcount 1 > > Then you just query in the following way : > > MGET topics:*:timestampN > > * is the wildcard, you order by viewcount and you have what you are asking > for ! > This is a simplified version of what you should do but personnally I really > like the combination of Cassandra and Redis. > > > Victor > > 2011/5/18 Aditya Narayan > >> I would arrange for memtable flush period in such a manner that the time >> period for which these most viewed discussions are generated equals the >> memtable flush timeperiod, so that the entire row of most viewed discussion >> on a topic is in one or maximum two memtables/ SST tables. >> This would also help minimize several versions of the same column in the >> row parts in different SST tables. >> >> >> >> On Wed, May 18, 2011 at 11:04 PM, Aditya Narayan wrote: >> >>> ************* >>> For a discussions forum, I need to show a page of most viewed >>> discussions. >>> >>> For implementing this, I maintain a count of views of a discussion & when >>> this views count of a discussion passes a certain threshold limit, the >>> discussion Id is added to a row of most viewed discussions. >>> >>> This row of most viewed discussions contains columns with Integer names & >>> values containing serialized lists of Ids of all discussions whose views >>> count equals the Integral name of this column. >>> >>> Thus if the view count of a discussion increases I'll need to move its >>> 'Id' from serialized list in some column to serialized list in another >>> column whose name represents the updated views count on that discussion. >>> >>> Thus I can get the most viewed discussions by getting the appropriate no >>> of columns from one end of this Integer sorted row. >>> >>> ************ >>> >>> I wanted to get feedback from you all, to know if this is a good design. >>> >>> Thanks >>> >>> >>> >>> >>> >>> >> > --00151747849e3984c104a3914089 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks victor!

Aren't there any good ways by using Cassandra alo= ne ?

On Wed, May 18, 2011 at 11:41 PM, op= envictor Open <openvictor@gmail.com> wrote:
Have you thought about user another kind of= Database, which supports volative content for example ?

I am curre= ntly thinking about doing something similar. The best and simplest option a= t the moment that I can think of is Redis. In redis you have the option of = querying keys with wildcards. Your problem can be done by just inserting an= UUID into Redis for a certain amount of time ( the best is to tailor this = amount of time as an inverse function of the number of keys existing in Red= is).

With Redis
What I would do : I cut down time in pieces of X m= inutes ( 15 minutes, for example by truncating a timestamp). Let timestampN= be the timestamp for the period of time ( [N,N+15] ), let Topic1 Topic2 be= two topics then :

One or more people will view Topic 1 then Topic2 then again Topic1 in t= his period of 15 minutes
(HINCRBY is the Increment)
HINCRBY <= span style=3D"font-family:courier new,monospace">topics:Topic1:timestampN v= iewcount 1
= HINCRBY<= /a> topics:Topic2:timestampN viewcount 1
= HINCRBY<= /a> topics:Topic1:timestampN viewcount 1


Then you just query in the following way :

=
MGET topics:*:timestampN

* is the wildcard, you order by viewcount and you have what you are ask= ing for !
This is a simplified version of what you should do but personn= ally I really like the combination of Cassandra and Redis.


Victor

2011/5/18 Aditya Narayan &= lt;adynnn@gmail.com>
I would arrange for memtable flush period in such a manner that the time pe= riod for which these most viewed discussions are generated equals the memta= ble flush timeperiod, so that the entire row of most viewed discussion on a= topic is in one or maximum two memtables/ SST tables.
This would also help minimize several versions of the same column in the ro= w parts in different SST tables.



On Wed, May 18, 2011 at 11:04 PM, Aditya Narayan <adynnn@g= mail.com> wrote:
*************
For a discussions forum, I = need to show a page of most viewed discussions.

For implementing thi= s, I maintain a count of views of a discussion & when this views count = of a discussion passes a certain threshold limit, the discussion Id is adde= d to a row of most viewed discussions.

This row of most viewed discussions contains columns with Integer names= & values containing serialized lists of Ids of all discussions whose v= iews count equals the Integral name of this column.

Thus if the view= count of a discussion increases I'll need to move its 'Id' fro= m serialized list in some column to serialized list in another column whose= name represents the updated views count on that discussion.

Thus I can get the most viewed discussions by getting the appropriate n= o of columns from one end of this Integer sorted row.

************
I wanted to get feedback from you all, to know if this is a good desi= gn.

Thanks








--00151747849e3984c104a3914089--