From user-return-14254-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Mar 07 02:40:58 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 36943 invoked from network); 7 Mar 2011 02:40:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Mar 2011 02:40:58 -0000 Received: (qmail 41960 invoked by uid 500); 7 Mar 2011 02:40:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 41885 invoked by uid 500); 7 Mar 2011 02:40:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 41871 invoked by uid 99); 7 Mar 2011 02:40:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 02:40:56 +0000 X-ASF-Spam-Status: No, hits=1.9 required=5.0 tests=FILL_THIS_FORM_FRAUD_PHISH,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tyler@datastax.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2011 02:40:49 +0000 Received: by wyb42 with SMTP id 42so4128021wyb.31 for ; Sun, 06 Mar 2011 18:40:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.51.135 with SMTP id b7mr1893323wec.29.1299465626901; Sun, 06 Mar 2011 18:40:26 -0800 (PST) Received: by 10.216.183.200 with HTTP; Sun, 6 Mar 2011 18:40:26 -0800 (PST) X-Originating-IP: [70.124.90.200] In-Reply-To: References: Date: Sun, 6 Mar 2011 20:40:26 -0600 Message-ID: Subject: Re: Designing a decent data model for an online music shop...confused/stuck on decisions From: Tyler Hobbs To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6de00ed16dde9049ddb6b10 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6de00ed16dde9049ddb6b10 Content-Type: text/plain; charset=ISO-8859-1 Regarding PHP performance with Cassandra, THRIFT-638was recently resolved and it shows some big performance improvements. I'll be upgrading the Thrift package that ships with phpcassa soon to include this fix, so you may want to compare performance numbers before and after. On Sun, Mar 6, 2011 at 8:03 PM, Courtney wrote: > We're in a bit of a predicament, we have an e-music store currently built > in PHP using codeigniter/mysql... > The current system has 100+K users and a decent song collection. Over the > last few months I've been playing with > Cassandra... needless to say I'm impressed but I have a few questions. > Firstly, I want to avoid re-writing the entire site if possible so my > instincts have made me inclined to replace the database layer > in code igniter... is this something anyone would recommend and are there > any gotchas in doing that? > > I can't say I've been terribly happy with PHP accessing cassandra, when > sample data of the same size was put into mysql and in cassandra (of the > same size/type) > The pages with php connecting to Cassandra took longer to load, (30K > records in table). > I've thought maybe it was my setup that needed tweaking and I've played > with as many a options as I could but the best I've gotten is matching query > time. > Query speed test was simply getting time stamps right before and after > query call returned... > > Is this something anyone else has seen, any comments suggestions? I've > tried using thrift, phpcassa and pandra with pretty similar numbers. > > My other thought turned to maybe it was the way I designed my CFs, at first > I used super columns to model user account CF based on a post I read > by Arin (WTF is a super column) but I later changed to using normal CFs. > > I'm trying to make this work but I get the feeling my approach is > somewhat...I don't mis-guided. > > Here's a break down of the current model. > CF:Users{ > uid > fname > lname > username > password > street > .... > } > Some additional columns in place for a user but keeping it simple... > CF:Library{ > uid > songid > ... > other info about user library > } > > CF:Songs{ > songid > title > artistid > } > > This all is still very relational like (considering I go on to have a CF > for playlist and artists) and I'm not sure if this is a good design for the > data but... when I looked into > combining some of the info and removing some CFs I run into the issue of > replicating data all over the place. If for example I stored the artist name > in the library for each record > then each then the artist would be replicated for every song they have for > every user who has that song in their library.... > > Where do you sort of draw the line on deciding how much is okay to be > replicated? > > As much as I am not liking the idea of building the application from > scratch, I'm considering the possibility of building from scratch in > Java/JSP just to get the benefit of using > the hector client. (Efforts from the guys doing the PHP libs is much > appreciated but PHP doesn't seem to go too well with Cas.) > > In the process of making decisions because the upgrade/rebuild needs to > have a fairly steady working version for October and I don't want to go > wrong before even starting. > > Recommendations. Suggestions, advice are all welcomed (Any experience with > PHP and Cas. is also welcomed since all my fav. libs. are in PHP I'm > reluctant to turn away) > -- Tyler Hobbs Software Engineer, DataStax Maintainer of the pycassa Cassandra Python client library --0016e6de00ed16dde9049ddb6b10 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Regarding PHP performance with Cassandra, THRIFT-638 was recently resolved and it sho= ws some big performance improvements.=A0 I'll be upgrading the Thrift p= ackage that ships with phpcassa soon to include this fix, so you may want t= o compare performance numbers before and after.

On Sun, Mar 6, 2011 at 8:03 PM, Courtney <e-mailadrian= @hotmail.com> wrote:
We're in a bit of a predicament, we have an= e-music=20 store currently built in PHP using codeigniter/mysql...
The current system has 100+K users and a decent= song=20 collection. Over the last few months I've been playing with
Cassandra... needless to say I'm impressed = but I have a=20 few questions.
Firstly, I want to avoid re-writing the entire = site if=20 possible so my instincts have made me inclined to replace the database=20 layer
in code igniter... is this something anyone wou= ld=20 recommend and are there any gotchas in doing that?
=A0
I can't say I've been terribly happy wi= th PHP accessing=20 cassandra, when sample data of the same size was put into mysql and in cass= andra=20 (of the same size/type)
The pages with php connecting to Cassandra took= longer=20 to load, (30K records in table).
I've thought maybe it was my setup that nee= ded tweaking=20 and I've played with as many a options as I could but the best I've= gotten is=20 matching query time.
Query speed test was simply getting time stamps= right=20 before and after query call returned...
=A0
Is this something anyone else has seen, any com= ments=20 suggestions? I've tried using thrift, phpcassa and pandra with pretty s= imilar=20 numbers.
=A0
My other thought turned to maybe it was the way= I=20 designed my CFs, at first I used super columns to model user account CF bas= ed on=20 a post I read
by Arin (WTF is a super column) but I later cha= nged to=20 using normal CFs.
=A0
I'm trying to make this work but I get the = feeling my=20 approach is somewhat...I don't mis-guided.
=A0
Here's a break down of the current model.
=A0=A0=A0=A0CF:Users{
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0uid
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0fname
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0lname
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0username
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0password
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0street
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0....
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0}
Some additional columns in place for a user but= keeping=20 it simple...
CF:Library{
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0uid
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0songid
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0...
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0other=20 info about user library
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0}
=A0
CF:Songs{
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0songid
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0title
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0artistid
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0}
=A0
This all is still very relational like (conside= ring I go=20 on to have a CF for playlist and artists) and I'm not sure if this is a= good=20 design for the data but... when I looked into
combining some of the info and removing some CF= s I run=20 into the issue of replicating data all over the place. If for example I sto= red=20 the artist name in the library for each record
then each then the artist would be replicated f= or every=20 song they have for every user who has that song in their=20 library....
=A0
Where do you sort of draw the line on deciding = how much=20 is okay to be replicated?
=A0
As much as I am not liking the idea of building= the=20 application from scratch, I'm considering the possibility of building f= rom=20 scratch in Java/JSP just to get the benefit of using
the hector client. (Efforts from the guys doing= the PHP=20 libs is much appreciated but PHP doesn't seem to go too well with=20 Cas.)
=A0
In the process of making decisions because the= =20 upgrade/rebuild needs to have a fairly steady working version for October a= nd I=20 don't want to go wrong before even starting.
=A0
Recommendations. Suggestions, advice are all we= lcomed=20 (Any experience with PHP and Cas. is also welcomed since all my fav. libs. = are=20 in PHP I'm reluctant to turn away)



--
Tyler Hobbs
Software Engineer, DataS= tax
Maintainer of the pycassa Cassandra Python client library
--0016e6de00ed16dde9049ddb6b10--