Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 8787 invoked from network); 1 Mar 2011 23:14:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Mar 2011 23:14:54 -0000 Received: (qmail 32520 invoked by uid 500); 1 Mar 2011 23:14:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32485 invoked by uid 500); 1 Mar 2011 23:14:51 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32477 invoked by uid 99); 1 Mar 2011 23:14:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 23:14:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of burcsade@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Mar 2011 23:14:45 +0000 Received: by bwz13 with SMTP id 13so5864536bwz.31 for ; Tue, 01 Mar 2011 15:13:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=hajRjusohagdCDvf7M8jZcUokvhe0r30HqYTl3jiB7U=; b=rFp0DjojLPB9QqL1aLyLPwDALlpiBFnqN3tXZxyZJk/qz/hh2BFBRf51TFEKnGseeq Pc2u3lGvU8sD2loYqYZwV+aIDOpUJ0e32lZet+IBSxBoAlt8Dl/AUpI1SqN4Um6j781i rDC0DB3KbhpRB5+YuI+UplakzMnfbHZcf/wAA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=xWqdBVLFI0YDPlq3Vjxc1MVAqJXbpkeM96r3WBYP3zZa3EyK7xB8UcyRtpiAT8VPY0 b1ZJyF6fKuMKE5g389yTtldpEaehwwPykeqTOwu0TUnQcfRT5O79EfQsi7PKB34GO2ml fIvECibmz+BWABDnJfZuMjG29lP8fnCcWw7wI= MIME-Version: 1.0 Received: by 10.204.11.13 with SMTP id r13mr3731796bkr.144.1299021214447; Tue, 01 Mar 2011 15:13:34 -0800 (PST) Received: by 10.204.64.84 with HTTP; Tue, 1 Mar 2011 15:13:34 -0800 (PST) In-Reply-To: References: Date: Wed, 2 Mar 2011 00:13:34 +0100 Message-ID: Subject: Re: Advice on a design From: Burc Sade To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00032555539e0afa9e049d73f25a --00032555539e0afa9e049d73f25a Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Vodnok, For tag searches I would use a search engine like Solr (Lucene), as I think it would be more flexible to query. You can update the index as new data comes in and query it for queries #1, #2 and #4. For "All doc of type=3D'BOT' and c_bot_code=3D'ABC'" query, I would create = the CF below. doc_types { 'BOT:ABC': { : } } You can assign a value you are going to need when after querying to the docid. The problem with this schema is that if there are not many type:c_bot_code combinations, there will be many columns under each key in this CF. If a combination has much much more columns than others, hot spot problem may arise. On Tue, Mar 1, 2011 at 11:39 PM, Vodnok wrote: > Hi, > > Totaly newbie on Cassandra (with phpcassa) with big background on > relationned database, i'm would like to use Cassandra for a trivial case.= So > i'm on it since 3 days. Sorry for my stupid question. I'm pretty sure i'm > wrong but i want to learn so i'm here > > > I would like your advise on a design for cassandra. > > > Case: > > - Users created Docs and can share docs with friends > - Users can read and share docs of their friends with other friends > - Docs can be of different type [text;picture;video;etc] > - Docs can be taggued > > > > Typical queries : > > > - Doc relative to tag > - Doc relative to mutiple tags > - Doc readed by user x > - Doc relative to tag and ratio readed_shared greater than x (see design) > - All doc of type=3D'IMG' favorized by my friend > - All doc of type=3D'BOT' and c_bot_code=3D'ABC' > - All doc of type=3D'BOT' favorized by my friend relative (tag) with 'fir= e' > and 'belgium' ? > > > > Design : > > > docs // all docs > { > =91123456=92: //id_docs > { > =91t_info=92: > { > 'c_type':'BOT' > 'b_del':'y' > 'b_publish':'y' > } > 't_info_type': > { > 'l_title':'Hello World!' > 'c_bot_code':'ABC' > } > 't_read_user' : //read by user x > { > //time + id_user > '123456789_123':'123' > '123456789_155':'155' > } > 't_shared_user' : //share by user x > { > //time + id_user > '123456789_123':'123' > '123456789_155':'155' > } > 't_tags' > { > 'fire':'fire' > 'belgium':'belgium' > } > 't_stats' > { > 'n_readed':'60' > 'n_shared':'6' > 'n_ratio_readed_shared':'0.1' > } > } > } > > > tags_docs // all tag linked to docs > { > 'fire'://tag > { > //creation_time + id_docs > '456789_123456': > { > 'id_doc':'123456' > 'time':'456789' > } > '456789_223456':'223456': > { > 'id_doc':'123456' > 'time':'456789' > } > '456789_323456':'223456': > { > 'id_doc':'123456' > 'time':'456789' > } > } > 'belgium': > { > ... > } > } > > > users // all users > { > =91123=92: //id_user > { > =91t_info=92: > { > l_name:'Boris' > c_lang=3D'fr' > > } > 't_readed_docs': > { > //time + id_doc > '123456789_123456':'123456' > '123458789_136456':'136456' > } > 't_shared_docs': > { > //time + id_doc > '123456789_123456':'123456' > '123458789_136456':'136456' > } > } > } > > > users_docs // all action by users on docs > { > =91123_123456=92: // id_user + id_doc > { > 'id_doc':'123456' > 'id_user':'123' > 'd_readed':'20110301' > 'd_shared':'20110301' > } > } > > > user_friends_act // all activity of user friends > { > =91123=92:// id_user > { > 't_readed_docs': //all docs readed by my friends > { > '223456_224_123456': // time + id_friend + id_docs > { > 'id_friend':'224' > 'id_docs':'123456' > 'time':'223456' > 'c_type=3D'BOT' > } > } > 't_shared_docs': //all docs shared by my friends > { > '223456_224_123456': // time + id_friend + id_docs > { > 'id_friend':'224' > 'id_docs':'123456' > 'time':'223456' > 'c_type=3D'BOT' > } > } > } > } > > > > I know that certain queries are not possible for now like : - All doc of > type=3D'BOT' favorized by my friend relative (tag) with 'fire' and 'belgi= um' ? > > > > What do you think ? > > > Thank you, > > > Vodnok, > > > (Please remember i'm on cassandra since 3 days) > --00032555539e0afa9e049d73f25a Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Vodnok,

For tag searches I would use a search engine = like Solr (Lucene), as I think it would be more flexible to query. You can = update the index as new data comes in and query it for queries #1, #2 and #= 4.

For "All doc of type=3D'BOT' and c_bot_cod= e=3D'ABC'" query, I would create the CF below.

<= /div>
doc_types
{
=A0=A0 'BOT:ABC':
=A0=A0{
=A0=A0 =A0<docid>: <creation_date?>=A0
=A0=A0}= =A0
}

You can as= sign a value you are going to need when after querying to the docid. The pr= oblem with this schema is that if there are not many type:c_bot_code combin= ations, there will be many columns under each key in this CF. If a combinat= ion has much much more columns than others, hot spot problem may arise.



On Tue, Mar 1, 2011 at 11:39 PM, Vodnok <vodnok@gmail.com> wrote:
Hi,

Totaly newbi= e on Cassandra (with phpcassa) with big background on relationned database,= i'm would like to use Cassandra for a trivial case. So i'm on it s= ince 3 days. Sorry for my stupid question. I'm pretty sure i'm wron= g but i want to learn so i'm here


I would like your advise on a design for= cassandra.


Case:
<= div>
- Users created Docs and can share docs with friends
- Users can read and share docs of their friends with other friends
- Docs can be of different type [text;picture;video;etc]
= - Docs can be taggued



Typical queries :


- Doc relati= ve to tag
- Doc relative to mutiple tags
- Doc readed b= y user x
- Doc relative to tag and ratio readed_shared greater th= an x (see design)
- All doc of type=3D'IMG' favorized by my friend
- A= ll doc of type=3D'BOT' and c_bot_code=3D'ABC'
- A= ll doc of type=3D'BOT' favorized by my friend relative (tag) with &= #39;fire' and 'belgium' ?



Design :


docs // all docs
{
=A0=A0 =A0= =91123456=92: //id_docs
=A0=A0 =A0{
=A0=A0 =A0 =A0 =A0= =91t_info=92:=A0
{<= /div>
'c_type':'B= OT'
'b_del= ':'y'
= 'b_publish':'y'
}
't_info_type':
{
'l_title':'= Hello World!'
= 'c_bot_code':'ABC'
}
't_read_user' : //read by user x=
{
//time + id_user
<= div> '123456789_123':= '123'
'= ;123456789_155':'155'
}
't_shared_user' : //share by use= r x
{
//time + id_user
<= div> '123456789_123':= '123'
'= ;123456789_155':'155'
}
't_tags'
{
'fire':'fir= e'
'belgiu= m':'belgium'
<= /span>}
't_stats'
<= div> {
'n_readed':'60'
'n_shared':'= ;6'
'n_rat= io_readed_shared':'0.1' =
}
}
}

tags_docs // all tag linked to docs
{
'fire'= ;://tag
{
= //creation_time + id_docs
'456789_123456':=
{
'id_doc':'123456'<= /div>
'time':'456= 789'
}
'456789_223456':= 9;223456':
{
'id_doc':'123456'
=
'time':'456= 789'
}
'456789_323456':'223456'= :
{
'id_doc':'1= 23456'
'ti= me':'456789'
<= /span>}
}
'belgium':
{
...
} <= /span>
}


users // all users
{
=A0=A0 =A0=91123=92: //id_user
=A0=A0 =A0{
=A0=A0 =A0 = =A0 =A0=91t_info=92:=A0
{
l_name:'Boris'<= /div>
c_lang=3D'fr&#= 39;

= }
't_readed_docs':=
{
//time + id_doc
'123456789_123456&#= 39;:'123456'
'123458789_136456':'136456'
}
't_shared_docs':
{
//time + id_doc
'123456789_123456'= ;:'123456'
'123458789_136456':'136456'
}
}
}


users_docs // all action by = users on docs
{
=A0=A0 =A0=91123_123456=92: // id_user = + id_doc
=A0=A0 =A0{
'id_doc':'123456'
'id_user':'1= 23'
'd_read= ed':'20110301'
= 'd_shared':'20110301'
}
}
<= br>

user_friends_act // all activity of user frien= ds
{
=A0=A0 =A0=91123=92:// id_user
=A0=A0 =A0{
't_= readed_docs': //all docs readed by my friends
{
'223456_224_123456': // time + id_friend + id_docs
{
'id_friend':'224'
'id_docs':= 9;123456'
'time':'22= 3456'
'c_= type=3D'BOT'
}
}
't_shared_docs': //all docs shared by my friends<= /div>
{
'223456_224_123456': // time + = id_friend + id_docs
{
'id_friend':&#= 39;224'
'= id_docs':'123456'
'time':'22= 3456'
'c_= type=3D'BOT'
}
}
}
}



I know that c= ertain queries are not possible for now like : - All doc of type=3D'BOT= ' favorized by my friend relative (tag) with 'fire' and 'be= lgium' ?



What do you think ?
=


Thank you,


=
Vodnok,


(Please remember= i'm on cassandra since 3 days)

--00032555539e0afa9e049d73f25a--