Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A527179C5 for ; Mon, 5 Dec 2011 10:20:41 +0000 (UTC) Received: (qmail 75927 invoked by uid 500); 5 Dec 2011 10:06:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 72711 invoked by uid 500); 5 Dec 2011 10:05:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 71068 invoked by uid 99); 5 Dec 2011 10:02:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 10:02:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yulinyen@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-ee0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 10:02:00 +0000 Received: by eekd4 with SMTP id d4so182025eek.31 for ; Mon, 05 Dec 2011 02:01:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LN/zuqqBLOSkxmD18ni+0aBnvL8919LednDubjK3H8w=; b=HXreKr4mF/iaFu8qbgeyXI3SXlJB9IU24Bz3YZ883VDz7Zxs9GO1elpZXF5pwH+cHy iNJ2gtsDGhl2myaYE3R+bOLa4+9PMludcmu38vMjlSnX+b/H4Af+3tgpifiZGBlRWusH twrPOiArvnxfRIQKnL3COxi+2rrh5DhcDIjd8= MIME-Version: 1.0 Received: by 10.213.105.193 with SMTP id u1mr498010ebo.11.1323079300008; Mon, 05 Dec 2011 02:01:40 -0800 (PST) Received: by 10.14.19.195 with HTTP; Mon, 5 Dec 2011 02:01:39 -0800 (PST) In-Reply-To: <28388102.253939.1323077782864.JavaMail.www@wsfrf1210> References: <30291830.85883.1322560928214.JavaMail.www@wsfrf1128> <28388102.253939.1323077782864.JavaMail.www@wsfrf1210> Date: Mon, 5 Dec 2011 18:01:39 +0800 Message-ID: Subject: Re: Re: Cassandra DataModeling recommendations From: Boris Yen To: user@cassandra.apache.org, pcohen@cegetel.net Content-Type: multipart/alternative; boundary=0015174c1838afca4d04b35567bb X-Virus-Checked: Checked by ClamAV on apache.org --0015174c1838afca4d04b35567bb Content-Type: text/plain; charset=ISO-8859-1 I think most of the book for cassandra are outdated, try to get information from http://www.datastax.com/docs/1.0/index As for ttl, you could read http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns for more information. for composite type, you could read http://www.slideshare.net/edanuff/indexing-in-cassandra On Mon, Dec 5, 2011 at 5:36 PM, wrote: > Hi > Thanks for the answer, as I read the book on Cassandra, I was not aware at > that time on Composite Key which I recently discovered. > > You mentioned a TTL and let the database remove the date for me. I never > read about that. Is it possible without an external batch ? > > I will try to rephrase in any case my goal: > > Storage: > - I would like to store for a user (identified by its id) several carts > (BLOB). > - Associated to these carts, I would like to attach metadata like > expiration date and possibly others. > > Queries/tasks: > - I would like to be able to retrieve all the carts of a given userId. > - I would like to have a mean to remove expired carts. > > I think that this use case is not very complicated but I was wondering if > there was any kind of modelling recommendation. > > Thanks for your answer. > > Best Regards > > ======================================== > > > > Not sure I understand your use case, but I think you could use a composite > column instead of composite key. > > For example, > > UserID:{ > TimeUUID1:CartID1, > TimeUUID2:CartID2, > TimeUUID3:CartID3, > } > > This way, you could do a slice query on the time if you do not need all > the carts, and you could also get all the carts in one query. > > For expired carts, maybe you could attach TTL to each column that has time > constraint, let the database remove the data for you. > > Hi all, >> In order to evaluate NoSQL solutions and to gain knowledge, I am >> currently working on a kind of prototype. >> Here is a brief overview of the scope: >> >> I would like to manage user carts. Lets keep things simple: >> A user can have up to n (lets say 3 for example) carts. Each cart will >> contain metadata and among them an expiration date and a blob containing >> stuff (xml in fact but I really don't care of the content). >> >> A user can save, retrieve or delete his carts. Additionally, a dedicated >> batch process would remove carts who are expired. >> >> Basically I was thinking of two ways to model the data: >> 1- A ColumnFamily with the userid as a key and having several >> SuperColumns each one describing a Cart and its content. >> This has the advantage that I can get all the Carts in a single get or >> can do some slice queries to get only some Carts. The problem is that I >> cannot if I am right create a secondary index on the expired date column >> inside each Cart. >> 2- A ColumnFamily with a composite key like userid::cartId containing the >> expiration date column and the blob. I can in that case create an index to >> perform a query on the expiration timestamp. The drawback is that if I want >> to get all the Carts I need to create either a secondary ColumnFamily >> listing the carts associated to a userid or use a kind of >> OrderPreservingPartitionner if I want to perform a Key-Range Query. >> >> I made some tests and I had some problems >> First I was unable to perform queries in the case 2 like: >> get Carts where timestamp < xxxxxxx; The (ugly, really!) workaround was >> to create a fake column always set to true and the query that worked was: >> get Carts where dummy=true and timestamp < xxxxxxx; But I really dislike >> this solution and I am almost sure this is not the right way to go. >> >> I tried something different like creating a dedicated timestamp >> columnfamily associating a key based on a timestamp and columns related to >> user and carts. In that case if I want outdated entries I could perform a >> range query on keys of this columnfamily. But again in that case I need an >> OrderPreservingPartionner and I fear that using a timestamp as a key would >> lead to a bad repartition scheme among the nodes. If I fit to the second >> proposal (with Standard Columns), columns could be directly the key like >> userId::cartId and there is no logic in the removal process. If I fit to >> first solution solution, I need to have some logic to analyze the column >> key or value to get userid + cartid. >> Another point, if I use this column family I have to manage "updates". If >> for example I replace Cart2 of user1, I need to remove the corresponding >> entry and add a new one. This is honestly probably not the hardest part. >> >> I have the feeling that having a userId based ColumnFamily with >> SuperColumns inside and a dedicated timestamp table is the best choice. In >> fact I think that basically my requests will be: >> - Give me all the carts of a userId >> - Remove all the expired carts: which is probably in fact 2 requests: >> Find all carts whose expiry date is before a given date. Delete the found >> stuff. >> >> I am fairly new to NoSQL and especially to Cassandra so I would like to >> get any advice on: >> 1- Is Cassandra suited to this kind of storage ? I would say yes >> 2- What is the right way to model the data and the related constraints. >> >> If my description is unclear or anyone does need more details, do not >> hesitate to ask >> Thanks in advance for any help or advice >> >> Regards >> >> Pascal >> > > > --0015174c1838afca4d04b35567bb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I think most of the book for cassandra are outdated, try to get information= from=A0http://www.datas= tax.com/docs/1.0/index

As for ttl, you could read=A0= http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-= columns=A0for more information.

for composite type, you could read=A0http://www.slideshare.net= /edanuff/indexing-in-cassandra=A0

On Mon, Dec 5, 2011 at 5:36 PM, <pcohen@cegetel.net> wrote:
Hi
Thanks for the answer, as I read the book on Cassandra, I was not a= ware at that time on Composite Key which I recently discovered.

Y= ou mentioned a TTL and let the database remove the date for me. I never re= ad about that. Is it possible without an external batch ?

I will try to rephrase in any case my goal:

Storage:
- I= would like to store for a user (identified by its id) several carts (BLOB)= .
- Associated to these carts, I would like to attach metadata like exp= iration date and possibly others.

Queries/tasks:
- I would like to be able to retrieve all the cart= s of a given userId.
- I would like to have a mean to remove expired ca= rts.

I think that this use case is not very complicated but I was = wondering if there was any kind of modelling recommendation.

Thanks for your answer.

Best Regards

=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D



Not sure I und= erstand your use case, but I think you could use a composite column instead= of composite key.

For example,=A0

UserID:{
= =A0 =A0 TimeUUID1:CartID1,
=A0 =A0=A0TimeUUID2:CartID2,
=A0 =A0=A0TimeUUID3:CartID3,
}
=
This way, you could do a slice query on the time if you do n= ot need all the carts, and you could also get all the carts in one query.

For=A0expired carts, maybe you could attach TTL to eac= h column that has time constraint, let the database remove the data for you= .

Hi all,
In order to evaluate NoSQL solutions and to gain knowledge, I am= currently working on a kind of prototype.
Here is a brief overview of= the scope:

I would like to manage user carts. Lets keep things simp= le:
A user can have up to n (lets say 3 for example) carts. Each cart will cont= ain metadata and among them an expiration date and a blob containing stuff = (xml in fact but I really don't care of the content).

A user ca= n save, retrieve or delete his carts. Additionally, a dedicated batch proce= ss would remove carts who are expired.

Basically I was thinking of two ways to model the data:
1- A ColumnF= amily with the userid as a key and having several SuperColumns each one des= cribing a Cart and its content.
This has the advantage that I can get a= ll the Carts in a single get or can do some slice queries to get only some = Carts. The problem is that I cannot if I am right create a secondary index = on the expired date column inside each Cart.
2- A ColumnFamily with a composite key like userid::cartId containing the = expiration date column and the blob. I can in that case create an index to = perform a query on the expiration timestamp. The drawback is that if I want= to get all the Carts I need to create either a secondary ColumnFamily list= ing the carts associated to a userid or use a kind of OrderPreservingPartit= ionner if I want to perform a Key-Range Query.

I made some tests and I had some problems
First I was unable to per= form queries in the case 2 like:
get Carts where timestamp < xxxxxxx;= The (ugly, really!) workaround was to create a fake column always set to t= rue and the query that worked was:
get Carts where dummy=3Dtrue and timestamp < xxxxxxx; But I really disl= ike this solution and I am almost sure this is not the right way to go.
=
I tried something different like creating a dedicated timestamp columnf= amily associating a key based on a timestamp and columns related to user an= d carts. In that case if I want outdated entries I could perform a range qu= ery on keys of this columnfamily. But again in that case I need an OrderPre= servingPartionner and I fear that using a timestamp as a key would lead to = a bad repartition scheme among the nodes. If I fit to the second proposal (= with Standard Columns), columns could be directly the key like userId::cart= Id and there is no logic in the removal process. If I fit to first solution= solution, I need to have some logic to analyze the column key or value to = get userid + cartid.
Another point, if I use this column family I have to manage "updates&= quot;. If for example I replace Cart2 of user1, I need to remove the corres= ponding entry and add a new one. This is honestly probably not the hardest = part.

I have the feeling that having a userId based ColumnFamily with SuperC= olumns inside and a dedicated timestamp table is the best choice. In fact I= think that basically my requests will be:
- Give me all the carts of a = userId
- Remove all the expired carts: which is probably in fact 2 requests: Find= all carts whose expiry date is before a given date. Delete the found stuff= .

I am fairly new to NoSQL and especially to Cassandra so I would li= ke to get any advice on:
1- Is Cassandra suited to this kind of storage ? I would say yes
2- Wha= t is the right way to model the data and the related constraints.

If= my description is unclear or anyone does need more details, do not hesitat= e to ask
Thanks in advance for any help or advice

Regards

Pascal
<= /blockquote>



<= /div> --0015174c1838afca4d04b35567bb--