incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vodnok <vod...@msn.com>
Subject Re: Advice on a design
Date Thu, 03 Mar 2011 14:00:34 GMT
Ok seems that i'll use Solr (with dedicated Cassandra) for search

I've readed this article :
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/on
RP vs OPP...


Here is my case


docs_shared{ //docs shared by users ordered by time
    'time:id_user:id_doc'
    {
        'time':'123456' //index on it
        'id_user':'123' //index on it
        'c_type':'BOT' //index on it
        'id_doc':'123' //index on it
    }
}

So i can list all doc shared by id_user = 123 and type ='BOT' ordered by
time....

Well i wanted because i discovered the RP vs OPP issue. I'm default so RP
and so row id are not ordered !!! And as it's recommanded, i would like to
stay RP

So other possibility is addind a dimension with super column as column are
ordered in RP

index{
docs_shared{ //docs shared by users ordered by time
    'time:id_user:id_doc'
    {
        'time':'123456' //index on it
        'id_user':'123' //index on it
        'c_type':'BOT' //index on it
        'id_doc':'123'
    }
}
}

BUT.... sexondary index is not possible on SC -> C


So next possibility is

index{
docs_shared_time_c_type_id_user{ //docs shared by users ordered by
time:c_type:id_user
    'time:c_type:id_user:id_doc' : 'id_doc'
}
docs_shared_c_type_time_id_user{ //docs shared by users ordered by
time:id_user:c_type
    'c_type:time:id_user:id_do' : 'id_doc'
}
... (there is 6 combinations of time c_type id_user)
}

Like that i can list with keystart and keyend and filters

Example :

No filter : index -> time:c_type:id_user
Filter on c_type :  index -> c_type:time:id_user
Filter on id_user :  index -> id_user:time:c_type
Filter on c_type and id_user : index -> id_user:c_type:time

Fortunately cassandra likes writing !!! (Ironic inside)


So i have a question : i've readed that secondary index on SC->C will maybe
arrive in next releases... Is this information true ? And is it already
planned ?


Thank you,

Sébastien,

2011/3/2 Burc Sade <burcsade@gmail.com>

> You can use PHP Solr Extension. It is a fully featured and light-weight
> client.
>
> http://www.php.net/manual/en/book.solr.php
>
> Without the secondary indexes on columns in CFs within SCFs, the best
> approach is to create query-specific CFs at the moment. In the end all comes
> down to how simple you can make your queries to have a minimum CF count.
>
> Regards,
> Burc
>
> On Wed, Mar 2, 2011 at 9:06 AM, Vodnok <vodnok@gmail.com> wrote:
>
>> I think too via Solr it'll be easier. Just need to google it. (if you have
>> links about Solr in php...)
>>
>> I realize that i have to remove some dimension to my CF...
>>
>> I thought it was possible to have SCF -> CF -> SC -> C:value having
>> secondary index on C but has i understood, secondary index on C on super is
>> not possible for now (but will be maybe in next version)
>> As i understand it's better to have more less complex CF then less more
>> complex CF
>>
>> Thank you for your reply,
>>
>>
>>
>> 2011/3/2 Burc Sade <burcsade@gmail.com>
>>
>> Hi Vodnok,
>>>
>>> For tag searches I would use a search engine like Solr (Lucene), as I
>>> think it would be more flexible to query. You can update the index as new
>>> data comes in and query it for queries #1, #2 and #4.
>>>
>>> For "All doc of type='BOT' and c_bot_code='ABC'" query, I would create
>>> the CF below.
>>>
>>> doc_types
>>> {
>>>    'BOT:ABC':
>>>   {
>>>     <docid>: <creation_date?>
>>>   }
>>> }
>>>
>>> You can assign a value you are going to need when after querying to the
>>> docid. The problem with this schema is that if there are not many
>>> type:c_bot_code combinations, there will be many columns under each key in
>>> this CF. If a combination has much much more columns than others, hot spot
>>> problem may arise.
>>>
>>>
>>>
>>> On Tue, Mar 1, 2011 at 11:39 PM, Vodnok <vodnok@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Totaly newbie on Cassandra (with phpcassa) with big background on
>>>> relationned database, i'm would like to use Cassandra for a trivial case.
So
>>>> i'm on it since 3 days. Sorry for my stupid question. I'm pretty sure i'm
>>>> wrong but i want to learn so i'm here
>>>>
>>>>
>>>> I would like your advise on a design for cassandra.
>>>>
>>>>
>>>> Case:
>>>>
>>>> - Users created Docs and can share docs with friends
>>>> - Users can read and share docs of their friends with other friends
>>>> - Docs can be of different type [text;picture;video;etc]
>>>> - Docs can be taggued
>>>>
>>>>
>>>>
>>>> Typical queries :
>>>>
>>>>
>>>> - Doc relative to tag
>>>> - Doc relative to mutiple tags
>>>> - Doc readed by user x
>>>> - Doc relative to tag and ratio readed_shared greater than x (see
>>>> design)
>>>> - All doc of type='IMG' favorized by my friend
>>>> - All doc of type='BOT' and c_bot_code='ABC'
>>>> - All doc of type='BOT' favorized by my friend relative (tag) with
>>>> 'fire' and 'belgium' ?
>>>>
>>>>
>>>>
>>>> Design :
>>>>
>>>>
>>>> docs // all docs
>>>> {
>>>>     ‘123456’: //id_docs
>>>>     {
>>>>         ‘t_info’:
>>>> {
>>>>  'c_type':'BOT'
>>>> 'b_del':'y'
>>>> 'b_publish':'y'
>>>>  }
>>>> 't_info_type':
>>>> {
>>>>  'l_title':'Hello World!'
>>>> 'c_bot_code':'ABC'
>>>>  }
>>>> 't_read_user' : //read by user x
>>>> {
>>>>  //time + id_user
>>>> '123456789_123':'123'
>>>> '123456789_155':'155'
>>>>  }
>>>> 't_shared_user' : //share by user x
>>>> {
>>>>  //time + id_user
>>>> '123456789_123':'123'
>>>> '123456789_155':'155'
>>>>  }
>>>> 't_tags'
>>>> {
>>>>  'fire':'fire'
>>>> 'belgium':'belgium'
>>>> }
>>>>  't_stats'
>>>> {
>>>> 'n_readed':'60'
>>>>  'n_shared':'6'
>>>> 'n_ratio_readed_shared':'0.1'
>>>>  }
>>>> }
>>>> }
>>>>
>>>>
>>>> tags_docs // all tag linked to docs
>>>> {
>>>> 'fire'://tag
>>>> {
>>>> //creation_time + id_docs
>>>>  '456789_123456':
>>>> {
>>>> 'id_doc':'123456'
>>>>  'time':'456789'
>>>> }
>>>> '456789_223456':'223456':
>>>>  {
>>>> 'id_doc':'123456'
>>>> 'time':'456789'
>>>>  }
>>>> '456789_323456':'223456':
>>>> {
>>>>  'id_doc':'123456'
>>>> 'time':'456789'
>>>> }
>>>>  }
>>>> 'belgium':
>>>> {
>>>>  ...
>>>> }
>>>> }
>>>>
>>>>
>>>> users // all users
>>>> {
>>>>     ‘123’: //id_user
>>>>     {
>>>>         ‘t_info’:
>>>> {
>>>>  l_name:'Boris'
>>>> c_lang='fr'
>>>>
>>>> }
>>>>  't_readed_docs':
>>>> {
>>>> //time + id_doc
>>>>  '123456789_123456':'123456'
>>>> '123458789_136456':'136456'
>>>>  }
>>>> 't_shared_docs':
>>>> {
>>>>  //time + id_doc
>>>> '123456789_123456':'123456'
>>>> '123458789_136456':'136456'
>>>>  }
>>>> }
>>>> }
>>>>
>>>>
>>>> users_docs // all action by users on docs
>>>> {
>>>>     ‘123_123456’: // id_user + id_doc
>>>>     {
>>>> 'id_doc':'123456'
>>>>  'id_user':'123'
>>>> 'd_readed':'20110301'
>>>> 'd_shared':'20110301'
>>>>  }
>>>> }
>>>>
>>>>
>>>> user_friends_act // all activity of user friends
>>>> {
>>>>     ‘123’:// id_user
>>>>     {
>>>> 't_readed_docs': //all docs readed by my friends
>>>> {
>>>> '223456_224_123456': // time + id_friend + id_docs
>>>>  {
>>>> 'id_friend':'224'
>>>> 'id_docs':'123456'
>>>>  'time':'223456'
>>>> 'c_type='BOT'
>>>>  }
>>>> }
>>>> 't_shared_docs': //all docs shared by my friends
>>>>  {
>>>> '223456_224_123456': // time + id_friend + id_docs
>>>> {
>>>>  'id_friend':'224'
>>>> 'id_docs':'123456'
>>>>  'time':'223456'
>>>> 'c_type='BOT'
>>>>  }
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>> I know that certain queries are not possible for now like : - All doc of
>>>> type='BOT' favorized by my friend relative (tag) with 'fire' and 'belgium'
?
>>>>
>>>>
>>>>
>>>> What do you think ?
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>> Vodnok,
>>>>
>>>>
>>>> (Please remember i'm on cassandra since 3 days)
>>>>
>>>
>>>
>>
>

Mime
View raw message