Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 49592 invoked from network); 2 Mar 2011 08:07:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Mar 2011 08:07:10 -0000 Received: (qmail 73274 invoked by uid 500); 2 Mar 2011 08:07:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 73144 invoked by uid 500); 2 Mar 2011 08:07:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 73136 invoked by uid 99); 2 Mar 2011 08:07:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2011 08:07:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vodnok@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2011 08:06:54 +0000 Received: by iyj8 with SMTP id 8so5292391iyj.31 for ; Wed, 02 Mar 2011 00:06:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=tLpcZONMMUQWoyQkbsikhm1VtndAf3/DDBwNaC57NUQ=; b=MEW4+JCvqblzcRppws4hvKlBK0/bdYQwHXFPpGZNg31E/QMtAkQt/fuIjGRxAIszu3 0PZ3+SLj6alchV1yhlJHBGxuGGOQJ3Q2/6HhWdrgElO+XAosYnpaOuEBttlS7GpmHbtW sY0qm4D342E6119cEYAH2w6XjrVID1QlINQuU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=F7XymETmPSQoJ4vtAxXOO/kQpzEXiMDltqEDNLnPI37DZGv9JWuXS1WDKXU25HR7dx lX7Y6WoEJOBQ8YT7JtHqBi8sRsHA7VpvymTaBU+KJZYKMQAp5GvV99bVCGzd47gjKkpp 91wiA27Wlgt1SpT20SoVyBCuVzgk9PreQLyCM= MIME-Version: 1.0 Received: by 10.42.64.209 with SMTP id h17mr7987835ici.247.1299053193258; Wed, 02 Mar 2011 00:06:33 -0800 (PST) Received: by 10.42.213.130 with HTTP; Wed, 2 Mar 2011 00:06:33 -0800 (PST) In-Reply-To: References: Date: Wed, 2 Mar 2011 09:06:33 +0100 Message-ID: Subject: Re: Advice on a design From: Vodnok To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba6152fe20e672049d7b64a0 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6152fe20e672049d7b64a0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I think too via Solr it'll be easier. Just need to google it. (if you have links about Solr in php...) I realize that i have to remove some dimension to my CF... I thought it was possible to have SCF -> CF -> SC -> C:value having secondary index on C but has i understood, secondary index on C on super is not possible for now (but will be maybe in next version) As i understand it's better to have more less complex CF then less more complex CF Thank you for your reply, 2011/3/2 Burc Sade > Hi Vodnok, > > For tag searches I would use a search engine like Solr (Lucene), as I thi= nk > it would be more flexible to query. You can update the index as new data > comes in and query it for queries #1, #2 and #4. > > For "All doc of type=3D'BOT' and c_bot_code=3D'ABC'" query, I would creat= e the > CF below. > > doc_types > { > 'BOT:ABC': > { > : > } > } > > You can assign a value you are going to need when after querying to the > docid. The problem with this schema is that if there are not many > type:c_bot_code combinations, there will be many columns under each key i= n > this CF. If a combination has much much more columns than others, hot spo= t > problem may arise. > > > > On Tue, Mar 1, 2011 at 11:39 PM, Vodnok wrote: > >> Hi, >> >> Totaly newbie on Cassandra (with phpcassa) with big background on >> relationned database, i'm would like to use Cassandra for a trivial case= . So >> i'm on it since 3 days. Sorry for my stupid question. I'm pretty sure i'= m >> wrong but i want to learn so i'm here >> >> >> I would like your advise on a design for cassandra. >> >> >> Case: >> >> - Users created Docs and can share docs with friends >> - Users can read and share docs of their friends with other friends >> - Docs can be of different type [text;picture;video;etc] >> - Docs can be taggued >> >> >> >> Typical queries : >> >> >> - Doc relative to tag >> - Doc relative to mutiple tags >> - Doc readed by user x >> - Doc relative to tag and ratio readed_shared greater than x (see design= ) >> - All doc of type=3D'IMG' favorized by my friend >> - All doc of type=3D'BOT' and c_bot_code=3D'ABC' >> - All doc of type=3D'BOT' favorized by my friend relative (tag) with 'fi= re' >> and 'belgium' ? >> >> >> >> Design : >> >> >> docs // all docs >> { >> =91123456=92: //id_docs >> { >> =91t_info=92: >> { >> 'c_type':'BOT' >> 'b_del':'y' >> 'b_publish':'y' >> } >> 't_info_type': >> { >> 'l_title':'Hello World!' >> 'c_bot_code':'ABC' >> } >> 't_read_user' : //read by user x >> { >> //time + id_user >> '123456789_123':'123' >> '123456789_155':'155' >> } >> 't_shared_user' : //share by user x >> { >> //time + id_user >> '123456789_123':'123' >> '123456789_155':'155' >> } >> 't_tags' >> { >> 'fire':'fire' >> 'belgium':'belgium' >> } >> 't_stats' >> { >> 'n_readed':'60' >> 'n_shared':'6' >> 'n_ratio_readed_shared':'0.1' >> } >> } >> } >> >> >> tags_docs // all tag linked to docs >> { >> 'fire'://tag >> { >> //creation_time + id_docs >> '456789_123456': >> { >> 'id_doc':'123456' >> 'time':'456789' >> } >> '456789_223456':'223456': >> { >> 'id_doc':'123456' >> 'time':'456789' >> } >> '456789_323456':'223456': >> { >> 'id_doc':'123456' >> 'time':'456789' >> } >> } >> 'belgium': >> { >> ... >> } >> } >> >> >> users // all users >> { >> =91123=92: //id_user >> { >> =91t_info=92: >> { >> l_name:'Boris' >> c_lang=3D'fr' >> >> } >> 't_readed_docs': >> { >> //time + id_doc >> '123456789_123456':'123456' >> '123458789_136456':'136456' >> } >> 't_shared_docs': >> { >> //time + id_doc >> '123456789_123456':'123456' >> '123458789_136456':'136456' >> } >> } >> } >> >> >> users_docs // all action by users on docs >> { >> =91123_123456=92: // id_user + id_doc >> { >> 'id_doc':'123456' >> 'id_user':'123' >> 'd_readed':'20110301' >> 'd_shared':'20110301' >> } >> } >> >> >> user_friends_act // all activity of user friends >> { >> =91123=92:// id_user >> { >> 't_readed_docs': //all docs readed by my friends >> { >> '223456_224_123456': // time + id_friend + id_docs >> { >> 'id_friend':'224' >> 'id_docs':'123456' >> 'time':'223456' >> 'c_type=3D'BOT' >> } >> } >> 't_shared_docs': //all docs shared by my friends >> { >> '223456_224_123456': // time + id_friend + id_docs >> { >> 'id_friend':'224' >> 'id_docs':'123456' >> 'time':'223456' >> 'c_type=3D'BOT' >> } >> } >> } >> } >> >> >> >> I know that certain queries are not possible for now like : - All doc of >> type=3D'BOT' favorized by my friend relative (tag) with 'fire' and 'belg= ium' ? >> >> >> >> What do you think ? >> >> >> Thank you, >> >> >> Vodnok, >> >> >> (Please remember i'm on cassandra since 3 days) >> > > --90e6ba6152fe20e672049d7b64a0 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
I think too via Solr it'll be easier. Just need to google it. (if = you have links about Solr in php...)

I realize tha= t i have to remove some dimension to my CF...

I= =A0thought it was possible to have SCF -> CF -&g= t; SC -> C:value having secondary index on C but has i understood, secon= dary index on C on super is not possible for now (but will be maybe in next= version)
As i understand it's better to have more less complex CF then les= s more complex CF

Thank you for your reply,



2011/3/2 Burc Sade <burcsade@gmail.com>
Hi Vodnok,

For tag searc= hes I would use a search engine like Solr (Lucene), as I think it would be = more flexible to query. You can update the index as new data comes in and q= uery it for queries #1, #2 and #4.

For "All doc of type=3D'BOT' and c_bot_cod= e=3D'ABC'" query, I would create the CF below.

<= /div>
doc_types
{
=A0=A0 'BOT:ABC':
=A0=A0{
=A0=A0 =A0<docid>: <creation_date?>=A0
=A0=A0}= =A0
}

You can assign a value you are goi= ng to need when after querying to the docid. The problem with this schema i= s that if there are not many type:c_bot_code combinations, there will be ma= ny columns under each key in this CF. If a combination has much much more c= olumns than others, hot spot problem may arise.



On Tue, Mar 1, 2011 at 11:39 PM, Vodnok <vodnok@gmail.com> = wrote:
Hi,

Totaly newbi= e on Cassandra (with phpcassa) with big background on relationned database,= i'm would like to use Cassandra for a trivial case. So i'm on it s= ince 3 days. Sorry for my stupid question. I'm pretty sure i'm wron= g but i want to learn so i'm here


I would like your advise on a design for= cassandra.


Case:
<= div>
- Users created Docs and can share docs with friends
- Users can read and share docs of their friends with other friends
- Docs can be of different type [text;picture;video;etc]
= - Docs can be taggued



Typical queries :


- Doc relati= ve to tag
- Doc relative to mutiple tags
- Doc readed b= y user x
- Doc relative to tag and ratio readed_shared greater th= an x (see design)
- All doc of type=3D'IMG' favorized by my friend
- A= ll doc of type=3D'BOT' and c_bot_code=3D'ABC'
- A= ll doc of type=3D'BOT' favorized by my friend relative (tag) with &= #39;fire' and 'belgium' ?



Design :


docs // all docs
{
=A0=A0 =A0= =91123456=92: //id_docs
=A0=A0 =A0{
=A0=A0 =A0 =A0 =A0= =91t_info=92:=A0
{<= /div>
'c_type':'B= OT'
'b_del= ':'y'
= 'b_publish':'y'
}
't_info_type':
{
'l_title':'= Hello World!'
= 'c_bot_code':'ABC'
}
't_read_user' : //read by user x=
{
//time + id_user
<= div> '123456789_123':= '123'
'= ;123456789_155':'155'
}
't_shared_user' : //share by use= r x
{
//time + id_user
<= div> '123456789_123':= '123'
'= ;123456789_155':'155'
}
't_tags'
{
'fire':'fir= e'
'belgiu= m':'belgium'
<= /span>}
't_stats'
<= div> {
'n_readed':'60'
'n_shared':'= ;6'
'n_rat= io_readed_shared':'0.1' =
}
}
}

tags_docs // all tag linked to docs
{
'fire'= ;://tag
{
= //creation_time + id_docs
'456789_123456':=
{
'id_doc':'123456'<= /div>
'time':'456= 789'
}
'456789_223456':= 9;223456':
{
'id_doc':'123456'
=
'time':'456= 789'
}
'456789_323456':'223456'= :
{
'id_doc':'1= 23456'
'ti= me':'456789'
<= /span>}
}
'belgium':
{
...
} <= /span>
}


users // all users
{
=A0=A0 =A0=91123=92: //id_user
=A0=A0 =A0{
=A0=A0 =A0 = =A0 =A0=91t_info=92:=A0
{
l_name:'Boris'<= /div>
c_lang=3D'fr&#= 39;

= }
't_readed_docs':=
{
//time + id_doc
'123456789_123456&#= 39;:'123456'
'123458789_136456':'136456'
}
't_shared_docs':
{
//time + id_doc
'123456789_123456'= ;:'123456'
'123458789_136456':'136456'
}
}
}


users_docs // all action by = users on docs
{
=A0=A0 =A0=91123_123456=92: // id_user = + id_doc
=A0=A0 =A0{
'id_doc':'123456'
'id_user':'1= 23'
'd_read= ed':'20110301'
= 'd_shared':'20110301'
}
}
<= br>

user_friends_act // all activity of user frien= ds
{
=A0=A0 =A0=91123=92:// id_user
=A0=A0 =A0{
't_= readed_docs': //all docs readed by my friends
{
'223456_224_123456': // time + id_friend + id_docs
{
'id_friend':'224'
'id_docs':= 9;123456'
'time':'22= 3456'
'c_= type=3D'BOT'
}
}
't_shared_docs': //all docs shared by my friends<= /div>
{
'223456_224_123456': // time + = id_friend + id_docs
{
'id_friend':&#= 39;224'
'= id_docs':'123456'
'time':'22= 3456'
'c_= type=3D'BOT'
}
}
}
}



I know that c= ertain queries are not possible for now like : - All doc of type=3D'BOT= ' favorized by my friend relative (tag) with 'fire' and 'be= lgium' ?



What do you think ?
=


Thank you,


=
Vodnok,


(Please remember= i'm on cassandra since 3 days)


--90e6ba6152fe20e672049d7b64a0--