From user-return-29258-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Oct 2 03:20:28 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B9044958E for ; Tue, 2 Oct 2012 03:20:28 +0000 (UTC) Received: (qmail 38371 invoked by uid 500); 2 Oct 2012 03:20:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38152 invoked by uid 500); 2 Oct 2012 03:20:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38128 invoked by uid 99); 2 Oct 2012 03:20:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 03:20:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of roshni_rajagopal@hotmail.com designates 65.55.34.144 as permitted sender) Received: from [65.55.34.144] (HELO col0-omc3-s6.col0.hotmail.com) (65.55.34.144) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 03:20:18 +0000 Received: from COL121-W21 ([65.55.34.136]) by col0-omc3-s6.col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 1 Oct 2012 20:19:57 -0700 Message-ID: Content-Type: multipart/alternative; boundary="_ed9ac1c5-f7c8-4faa-8047-c350d9a8c947_" X-Originating-IP: [122.179.96.213] From: Roshni Rajagopal To: Subject: RE: Data Modeling: Comments with Voting Date: Tue, 2 Oct 2012 08:49:56 +0530 Importance: Normal In-Reply-To: References: <50639F7D.5080004@mustardgrain.com> <360B2221-3DA3-46A1-BF28-8EA50DF56A22@venarc.com>, MIME-Version: 1.0 X-OriginalArrivalTime: 02 Oct 2012 03:19:57.0863 (UTC) FILETIME=[CCDD5B70:01CDA04C] X-Virus-Checked: Checked by ClamAV on apache.org --_ed9ac1c5-f7c8-4faa-8047-c350d9a8c947_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi =2C=20 To explain my suggestions - my thoughts were=20 a) you need to store entity type information about a comment like date crea= ted=2C comment text=2C commented by etc. I cant think of any other master i= nformation for a comment=2C but in general one starts with entities in a st= andard static column family. If you store an entity in a dynamic denormail= ized form=2C if any master data changes you would need to iterate across al= l rows and update it which is expensive in cassandra. Here comment text is = editable. b) So when a comment is created it goes to the static column family. Also a= n entry is made in the dynamic sort_by_time_list column family with column = as time created. I didn't suggest a and c be clubbed so that master informa= tion remains in one place. The other approach would be to have a comment st= ored as a JSON in the column value. However if you need to update comment t= ext =2C it would be hard to identify the comment column and update it. c= ) when a comment gets a vote=2C the counter column family is incremented to= know the number of votes for a comment. Also to sort by number of votes = =2C after incrementing the counter you need to write the current number of = votes=2C and the comment id in the column family d. But I see now that you = also need to delete the old number of votes & comment id column and add a n= ew column with current number of votes and comment id. It would be sorted = by number of votes. If there are many ways to sort=2C its better to do it in the application to= avoid having a new column family for each type of sort...however Im not ce= rtain over time and volume which approach would perform better.Sorting can = be complex - aaron's blog post http://thelastpickle.com/2012/08/18/Sorting-= Lists-For-Humans/ =20 Welcome any feedback on my suggestions. From: aaron@thelastpickle.com Subject: Re: Data Modeling: Comments with Voting Date: Tue=2C 2 Oct 2012 10:39:42 +1300 To: user@cassandra.apache.org You cannot (and probably do not want to) sort continually when the voting i= s going on.=20 You can store the votes using CounterColumnTypes in column values. When som= eone votes you then (somehow) queue a job that will read the vote counts fo= r the post / comment=2C pivot and sort on the vote count=2C and then write = the updated leader board to cassandra.=20 Alternatively if you have a small number of comments for a post just read a= ll the votes and sort them as part of the read.=20 Cheers =20 -----------------Aaron MortonFreelance Developer@aaronmortonhttp://www.thel= astpickle.com On 30/09/2012=2C at 8:25 AM=2C Drew Kutcharian wrote:Than= ks Roshni=2C I'm not sue how #d will work when users are actually voting on a comment. W= hat happens when two users vote on the same comment simultaneously? How do = you update the entries in #d column family to prevent duplicates? Also #a and #c can be combined together using TimeUUID as comment ids. - Drew On Sep 27=2C 2012=2C at 2:13 AM=2C Roshni Rajagopal wrote: Hi Drew=2C I think you have 4 requirements. Here are my suggestions. a) store comments : have a static column family for comments with master da= ta like created date=2C created by =2C length etcb) when a person votes for= a comment=2C increment a vote counter : have a counter column family for i= ncrementing the votes for each commentc) display comments sorted by date cr= eated: have a column family with a dummy row id 'sort_by_time_list'=2C co= lumn names can be date created(timeUUID)=2C and column value can be comment= id d) display comments sorted by number of votes: have a column family wit= h a dummy row id 'sort_by_votes_list' and column names can be a composite o= f number of votes =2C and comment id ( as more than 1 comment can have the = same votes) Regards=2CRoshni > Date: Wed=2C 26 Sep 2012 17:36:13 -0700 > From: kirk@mustardgrain.com > To: user@cassandra.apache.org > CC: drew@venarc.com > Subject: Re: Data Modeling: Comments with Voting >=20 > Depending on your needs=2C you could simply duplicate the comments in two= =20 > separate CFs with the column names including time in one and the vote in= =20 > the other. If you allow for updates to the comments=2C that would pose=20 > some issues you'd need to solve at the app level. >=20 > On 9/26/12 4:28 PM=2C Drew Kutcharian wrote: > > Hi Guys=2C > > > > Wondering what would be the best way to model a flat (no sub comments= =2C i.e. twitter) comments list with support for voting (where I can sort b= y create time or votes) in Cassandra? > > > > To demonstrate: > > > > Sorted by create time: > > - comment 1 (5 votes) > > - comment 2 (1 votes) > > - comment 3 (no votes) > > - comment 4 (10 votes) > > > > Sorted by votes: > > - comment 4 (10 votes) > > - comment 1 (5 votes) > > - comment 2 (1 votes) > > - comment 3 (no votes) > > > > It's the sorted-by-votes that I'm having a bit of a trouble with. I'm l= ooking for a roll-your-own approach and prefer not to use secondary indexes= and CQL sorting. > > > > Thanks=2C > > > > Drew > > >=20 =20 = --_ed9ac1c5-f7c8-4faa-8047-c350d9a8c947_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi =2C =3B

To explain my suggestions - my thoughts w= ere =3B

a) you need to store entity type infor= mation about a comment like date created=2C comment text=2C commented by et= c. I cant think of any other master information for a comment=2C but in gen= eral one starts with =3Bentities =3Bin a standard static column fam= ily.  =3BIf you store an entity in a dynamic denormailized form=2C if a= ny master data changes you would need to iterate across all rows and update= it which is expensive in cassandra. Here comment text is editable.

b) So when a comment is created it goes to the static col= umn family. Also an entry is made in the dynamic =3Bsort_by_time_list column family with column as time create= d. I didn't suggest a and c be clubbed so that master information remains i= n one place. The other approach would be to have a comment stored as a JSON= in the column value. However if you need to update comment text  =3B &= nbsp=3B=2C it would be hard to identify the comment column and update it.
 =3B
c) when a comment gets a vote=2C the counter column family is incremente= d to know the number of votes for a comment. Also to sort by number of vote= s  =3B=2C after incrementing the counter you need to write the current = number of votes=2C and the comment id in the column family d. But I see now= that you also need to delete the old number of votes &=3B comment id co= lumn and add a new  =3Bcolumn with current number of votes and comment = id. It would be sorted by number of votes.

If ther= e are many ways to sort=2C its better to do it in the application to avoid = having a new column family for each type of sort...however Im not certain o= ver time and volume which approach would perform better.
Sorting = can be complex - aaron's blog post =3Bhttp://thelastpickle.com/2012/08/= 18/Sorting-Lists-For-Humans/  =3B

Welcome any = feedback on my suggestions.




From: aaron@thelastp= ickle.com
Subject: Re: Data Modeling: Comments with Voting
Date: Tue= =2C 2 Oct 2012 10:39:42 +1300
To: user@cassandra.apache.org

You c= annot (and probably do not want to) sort continually when the voting is goi= ng on. =3B

You can store the votes using CounterColu= mnTypes in column values. When someone votes you then (somehow) queue a job= that will read the vote counts for the post / comment=2C pivot and sort on= the vote count=2C and then write the updated leader board to cassandra.&nb= sp=3B

Alternatively if you have a small number of = comments for a post just read all the votes and sort them as part of the re= ad. =3B

Cheers
 =3B =3B
<= div>
-----------------
Aar= on Morton
Freelance Developer
@aaronmorton
http://www.thelast= pickle.com

On 30/09/2012=2C at 8:25 AM=2C Drew Kutcharian <=3Bdrew@venarc.com>=3B wrote:

Thanks Roshni=2C

I'm not sue how #d will work when users are actually= voting on a comment. What happens when two users vote on the same comment = simultaneously? How do you update the entries in #d column family to preven= t duplicates?

 =3BAlso #a and #c can be combin= ed together using TimeUUID as comment ids.

- Drew<= br>


On Sep 27=2C 2012=2C at 2:13 AM=2C Roshni Rajagop= al <=3Broshni_rajagopal@h= otmail.com>=3B wrote:

Hi Drew=2C

I think you have 4 requirements. Here are my = suggestions.

a) store comments : have a static col= umn family for comments with master data like created date=2C created by = =2C length etc
b) when a person votes for a comment=2C increment = a vote counter : have a counter column family for incrementing the votes fo= r each comment
c) display comments sorted by date created: have a= column family with a dummy row id  =3B'sort_by_time_list'=2C  =3Bc= olumn names can be date created(timeUUID)=2C and column value can be commen= t id =3B
d) display comments sorted by number of votes: have = a column family with a dummy row id 'sort_by_votes_list' and column names c= an be a composite of number of votes =2C and comment id ( as more than 1 co= mment can have the same votes)


Rega= rds=2C
Roshni

>=3B Date: Wed=2C 26 Sep 2012 17:36:13 -0700
>=3B From: kirk@mustardgrain.com
>=3B To: user@cassandra.apache.org
&= gt=3B CC: drew@venarc.com
>=3B = Subject: Re: Data Modeling: Comments with Voting
>=3B
>=3B Depen= ding on your needs=2C you could simply duplicate the comments in two
&g= t=3B separate CFs with the column names including time in one and the vote = in
>=3B the other. If you allow for updates to the comments=2C that w= ould pose
>=3B some issues you'd need to solve at the app level.
&= gt=3B
>=3B On 9/26/12 4:28 PM=2C Drew Kutcharian wrote:
>=3B >= =3B Hi Guys=2C
>=3B >=3B
>=3B >=3B Wondering what would be th= e best way to model a flat (no sub comments=2C i.e. twitter) comments list = with support for voting (where I can sort by create time or votes) in Cassa= ndra?
>=3B >=3B
>=3B >=3B To demonstrate:
>=3B >=3B>=3B >=3B Sorted by create time:
>=3B >=3B - comment 1 (5 votes= )
>=3B >=3B - comment 2 (1 votes)
>=3B >=3B - comment 3 (no v= otes)
>=3B >=3B - comment 4 (10 votes)
>=3B >=3B
>=3B &g= t=3B Sorted by votes:
>=3B >=3B - comment 4 (10 votes)
>=3B >= =3B - comment 1 (5 votes)
>=3B >=3B - comment 2 (1 votes)
>=3B = >=3B - comment 3 (no votes)
>=3B >=3B
>=3B >=3B It's the so= rted-by-votes that I'm having a bit of a trouble with. I'm looking for a ro= ll-your-own approach and prefer not to use secondary indexes and CQL sortin= g.
>=3B >=3B
>=3B >=3B Thanks=2C
>=3B >=3B
>=3B &= gt=3B Drew
>=3B >=3B
>=3B

= --_ed9ac1c5-f7c8-4faa-8047-c350d9a8c947_--