Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 67125 invoked from network); 1 Dec 2009 10:43:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Dec 2009 10:43:48 -0000 Received: (qmail 29598 invoked by uid 500); 1 Dec 2009 10:43:48 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 29581 invoked by uid 500); 1 Dec 2009 10:43:48 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 29572 invoked by uid 99); 1 Dec 2009 10:43:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2009 10:43:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of quyenpnq@gmail.com designates 209.85.210.174 as permitted sender) Received: from [209.85.210.174] (HELO mail-yx0-f174.google.com) (209.85.210.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Dec 2009 10:43:39 +0000 Received: by yxe4 with SMTP id 4so4128745yxe.32 for ; Tue, 01 Dec 2009 02:43:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Cks163RH+qXuLozZym4Xc/A4xYZCDhuC6ZhgSR1cvTo=; b=xpRPfkoSUb6oyD9bDBI2ox8ZPCqhjxWhAOGmiDaXBHXYrgV64PlunmiqhGLmRlw6+p rb6XF+1V8ACYZ3jOIDmRj3kscwsd6JmhuHGUwSypzCMUYN4kGFXOBsmn0z1VjLCTLL84 IQU1BJ2kt/wSGVF/V/YHH8fK+oc3vxFNsHmdI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=g/2KOPRSIPHl/NXDDKdgf04jCTXOVRuNSCsV0QBuYmAfOwKt7uF8RJ/UYN7aShl+dU cbq1coAoPNlneWr2+o546oumkmG1qQvSTblLeOXXJqwgxyZ7JqxUfBMn/IhnT9F4JnsP Kj/9LNWkVWAPJj94CV+GnKt4YxJugwmFRpvYY= MIME-Version: 1.0 Received: by 10.150.235.17 with SMTP id i17mr8515901ybh.200.1259664198566; Tue, 01 Dec 2009 02:43:18 -0800 (PST) In-Reply-To: <4B14E0CB.1040803@gmail.com> References: <4B14E0CB.1040803@gmail.com> Date: Tue, 1 Dec 2009 05:43:18 -0500 Message-ID: Subject: Re: Casaandra limitation with super column From: Quyen Pham Ngoc To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=000e0cd23b0c17834d0479a86dc8 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd23b0c17834d0479a86dc8 Content-Type: text/plain; charset=ISO-8859-1 Hi TR, The query I used is list email of user (also support paging) Thanks for your reply. On Tue, Dec 1, 2009 at 4:24 AM, Tux Racer wrote: > Hello QuyenPN, > > You forgot to tell us what typical search queries you will do against your > database. > If you know the user id and mail id, and just want to get the mail content, > then you may even not need super columns: > > key= userid_mailid->content:the_content > > If you only know the user id and want to get the mails for that user, you > could get the mails ID using and key ordered scanner. > > Cheers > TR > > > Quyen Pham Ngoc wrote: > >> Hi all, >> >> I know the limitation when using super column. >> >> "# Cassandra has two levels of indexes: key and column. But in super >> columnfamilies there is a third level of subcolumns; these are not indexed, >> and any request for a subcolumn deserializes _all_ the subcolumns in that >> supercolumn. So you want to avoid a data model that requires large numbers >> of subcolumns" >> >> I have Mail data model like >> //Column Family >> MailBox{ >> userId{//row key >> "inbox":{//super column >> mailId1: mailData1, >> mailId2: mailData2 }, >> "outbox":{//super column >> mailId3: mailData3, >> mailId4: mailData4 >> } >> } >> } >> >> I know above design violate the Cassandra limitation with super column, >> because day by day, email user send and receive increase. >> Try to avoid this, I have 2 solution: >> 1. Use 2 column Family: InboxMailBox and OutboxMailBox >> //Column Family >> InboxMailBox{ >> userId{//row key >> mailId1: mailData1, >> mailId2: mailData2 } >> } >> >> //Column Family >> OutboxMailBox{ >> userId{//row key >> mailId3: mailData3, >> mailId4: mailData4 } >> } >> >> 2. Use complex row key: I use a prefix append to userId, ex "inbox" or >> "outbox" >> >> //Column Family >> MailBox{ >> prefix + userId{//row key >> mailId1: mailData1, >> mailId2: mailData2 } >> } >> >> Could you give me some advice? >> Thanks a lot for support. >> >> >> Best regards, >> QuyenPN >> >> > -- Best regards, QuyenPN Mail: quyenpnq@gmail.com Tel: 0909 269 792 --000e0cd23b0c17834d0479a86dc8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi TR,

The query I used is list email of user (also support paging)=

Thanks for your reply.

On Tue, De= c 1, 2009 at 4:24 AM, Tux Racer <tuxracer69@gmail.com> wrote:
Hello QuyenPN,
You forgot to tell us what typical search queries you will do against your = database.
If you know the user id and mail id, and just want to get the mail content,= then you may even not need super columns:

key=3D userid_mailid->content:the_content

If you only know the user id and want to get the mails for that user, you c= ould get the mails ID using and key ordered scanner.

Cheers
TR


Quyen Pham Ngoc wrote:
Hi all,

I know the limitation when using super column.

"# Cassandra has two levels of indexes: key and column. But in super c= olumnfamilies there is a third level of subcolumns; these are not indexed, = and any request for a subcolumn deserializes _all_ the subcolumns in that s= upercolumn. So you want to avoid a data model that requires large numbers o= f subcolumns"

I have Mail data model like
//Column Family
MailBox{
=A0 =A0userId{//row key
=A0 =A0 =A0 =A0"inbox":{//super column
=A0 =A0 =A0 =A0 =A0 =A0mailId1: mailData1,
=A0 =A0 =A0 =A0 =A0 =A0mailId2: mailData2 =A0 =A0 =A0 =A0 =A0 =A0},
=A0 =A0 =A0 =A0"outbox":{//super column
=A0 =A0 =A0 =A0 =A0 =A0mailId3: mailData3,
=A0 =A0 =A0 =A0 =A0 =A0mailId4: mailData4
=A0 =A0 =A0 =A0}
=A0 =A0 =A0 =A0}
}

I know above design violate the Cassandra limitation with super column, bec= ause day by day, email user send and receive increase.
Try to avoid this, I have 2 solution:
1. Use 2 column Family: InboxMailBox and OutboxMailBox
//Column Family
InboxMailBox{
=A0 =A0userId{//row key
=A0 =A0 =A0 =A0mailId1: mailData1,
=A0 =A0 =A0 =A0mailId2: mailData2 =A0 =A0 =A0 =A0}
}

//Column Family
OutboxMailBox{
=A0 =A0userId{//row key
=A0 =A0 =A0 =A0mailId3: mailData3,
=A0 =A0 =A0 =A0mailId4: mailData4 =A0 =A0 =A0 =A0 =A0 =A0}
}

2. Use complex row key: I use a prefix append to userId, ex "inbox&quo= t; or "outbox"

//Column Family
MailBox{
=A0 =A0prefix + userId{//row key
=A0 =A0 =A0 =A0mailId1: mailData1,
=A0 =A0 =A0 =A0mailId2: mailData2 =A0 =A0 =A0 =A0 =A0 =A0}
}

Could you give me some advice?
Thanks a lot for support.


Best regards,
QuyenPN





--
Best regard= s,
QuyenPN

Mail: quyenpnq@= gmail.com
Tel: 0909 269 792
--000e0cd23b0c17834d0479a86dc8--