incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burc Sade <burcs...@gmail.com>
Subject Re: Advice on a design
Date Wed, 02 Mar 2011 08:47:43 GMT
You can use PHP Solr Extension. It is a fully featured and light-weight
client.

http://www.php.net/manual/en/book.solr.php

Without the secondary indexes on columns in CFs within SCFs, the best
approach is to create query-specific CFs at the moment. In the end all comes
down to how simple you can make your queries to have a minimum CF count.

Regards,
Burc

On Wed, Mar 2, 2011 at 9:06 AM, Vodnok <vodnok@gmail.com> wrote:

> I think too via Solr it'll be easier. Just need to google it. (if you have
> links about Solr in php...)
>
> I realize that i have to remove some dimension to my CF...
>
> I thought it was possible to have SCF -> CF -> SC -> C:value having
> secondary index on C but has i understood, secondary index on C on super is
> not possible for now (but will be maybe in next version)
> As i understand it's better to have more less complex CF then less more
> complex CF
>
> Thank you for your reply,
>
>
>
> 2011/3/2 Burc Sade <burcsade@gmail.com>
>
> Hi Vodnok,
>>
>> For tag searches I would use a search engine like Solr (Lucene), as I
>> think it would be more flexible to query. You can update the index as new
>> data comes in and query it for queries #1, #2 and #4.
>>
>> For "All doc of type='BOT' and c_bot_code='ABC'" query, I would create the
>> CF below.
>>
>> doc_types
>> {
>>    'BOT:ABC':
>>   {
>>     <docid>: <creation_date?>
>>   }
>> }
>>
>> You can assign a value you are going to need when after querying to the
>> docid. The problem with this schema is that if there are not many
>> type:c_bot_code combinations, there will be many columns under each key in
>> this CF. If a combination has much much more columns than others, hot spot
>> problem may arise.
>>
>>
>>
>> On Tue, Mar 1, 2011 at 11:39 PM, Vodnok <vodnok@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Totaly newbie on Cassandra (with phpcassa) with big background on
>>> relationned database, i'm would like to use Cassandra for a trivial case. So
>>> i'm on it since 3 days. Sorry for my stupid question. I'm pretty sure i'm
>>> wrong but i want to learn so i'm here
>>>
>>>
>>> I would like your advise on a design for cassandra.
>>>
>>>
>>> Case:
>>>
>>> - Users created Docs and can share docs with friends
>>> - Users can read and share docs of their friends with other friends
>>> - Docs can be of different type [text;picture;video;etc]
>>> - Docs can be taggued
>>>
>>>
>>>
>>> Typical queries :
>>>
>>>
>>> - Doc relative to tag
>>> - Doc relative to mutiple tags
>>> - Doc readed by user x
>>> - Doc relative to tag and ratio readed_shared greater than x (see design)
>>> - All doc of type='IMG' favorized by my friend
>>> - All doc of type='BOT' and c_bot_code='ABC'
>>> - All doc of type='BOT' favorized by my friend relative (tag) with 'fire'
>>> and 'belgium' ?
>>>
>>>
>>>
>>> Design :
>>>
>>>
>>> docs // all docs
>>> {
>>>     ‘123456’: //id_docs
>>>     {
>>>         ‘t_info’:
>>> {
>>>  'c_type':'BOT'
>>> 'b_del':'y'
>>> 'b_publish':'y'
>>>  }
>>> 't_info_type':
>>> {
>>>  'l_title':'Hello World!'
>>> 'c_bot_code':'ABC'
>>>  }
>>> 't_read_user' : //read by user x
>>> {
>>>  //time + id_user
>>> '123456789_123':'123'
>>> '123456789_155':'155'
>>>  }
>>> 't_shared_user' : //share by user x
>>> {
>>>  //time + id_user
>>> '123456789_123':'123'
>>> '123456789_155':'155'
>>>  }
>>> 't_tags'
>>> {
>>>  'fire':'fire'
>>> 'belgium':'belgium'
>>> }
>>>  't_stats'
>>> {
>>> 'n_readed':'60'
>>>  'n_shared':'6'
>>> 'n_ratio_readed_shared':'0.1'
>>>  }
>>> }
>>> }
>>>
>>>
>>> tags_docs // all tag linked to docs
>>> {
>>> 'fire'://tag
>>> {
>>> //creation_time + id_docs
>>>  '456789_123456':
>>> {
>>> 'id_doc':'123456'
>>>  'time':'456789'
>>> }
>>> '456789_223456':'223456':
>>>  {
>>> 'id_doc':'123456'
>>> 'time':'456789'
>>>  }
>>> '456789_323456':'223456':
>>> {
>>>  'id_doc':'123456'
>>> 'time':'456789'
>>> }
>>>  }
>>> 'belgium':
>>> {
>>>  ...
>>> }
>>> }
>>>
>>>
>>> users // all users
>>> {
>>>     ‘123’: //id_user
>>>     {
>>>         ‘t_info’:
>>> {
>>>  l_name:'Boris'
>>> c_lang='fr'
>>>
>>> }
>>>  't_readed_docs':
>>> {
>>> //time + id_doc
>>>  '123456789_123456':'123456'
>>> '123458789_136456':'136456'
>>>  }
>>> 't_shared_docs':
>>> {
>>>  //time + id_doc
>>> '123456789_123456':'123456'
>>> '123458789_136456':'136456'
>>>  }
>>> }
>>> }
>>>
>>>
>>> users_docs // all action by users on docs
>>> {
>>>     ‘123_123456’: // id_user + id_doc
>>>     {
>>> 'id_doc':'123456'
>>>  'id_user':'123'
>>> 'd_readed':'20110301'
>>> 'd_shared':'20110301'
>>>  }
>>> }
>>>
>>>
>>> user_friends_act // all activity of user friends
>>> {
>>>     ‘123’:// id_user
>>>     {
>>> 't_readed_docs': //all docs readed by my friends
>>> {
>>> '223456_224_123456': // time + id_friend + id_docs
>>>  {
>>> 'id_friend':'224'
>>> 'id_docs':'123456'
>>>  'time':'223456'
>>> 'c_type='BOT'
>>>  }
>>> }
>>> 't_shared_docs': //all docs shared by my friends
>>>  {
>>> '223456_224_123456': // time + id_friend + id_docs
>>> {
>>>  'id_friend':'224'
>>> 'id_docs':'123456'
>>>  'time':'223456'
>>> 'c_type='BOT'
>>>  }
>>> }
>>> }
>>> }
>>>
>>>
>>>
>>> I know that certain queries are not possible for now like : - All doc of
>>> type='BOT' favorized by my friend relative (tag) with 'fire' and 'belgium' ?
>>>
>>>
>>>
>>> What do you think ?
>>>
>>>
>>> Thank you,
>>>
>>>
>>> Vodnok,
>>>
>>>
>>> (Please remember i'm on cassandra since 3 days)
>>>
>>
>>
>

Mime
View raw message