incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Advice on a design
Date Thu, 03 Mar 2011 14:16:02 GMT
Have you considered using Solandra (Solr/Lucene + Cassandra) - https://github.com/tjake/Lucandra#readme
?  There is a #solandra channel on freenode if you had any questions as well.

On Mar 3, 2011, at 8:00 AM, Vodnok wrote:

> Ok seems that i'll use Solr (with dedicated Cassandra) for search
> 
> I've readed this article : http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
on RP vs OPP... 
> 
> 
> Here is my case
> 
> 
> docs_shared{ //docs shared by users ordered by time
>     'time:id_user:id_doc' 
>     {
>         'time':'123456' //index on it
>         'id_user':'123' //index on it
>         'c_type':'BOT' //index on it
>         'id_doc':'123' //index on it      
>     }
> } 
> 
> So i can list all doc shared by id_user = 123 and type ='BOT' ordered by time....
> 
> Well i wanted because i discovered the RP vs OPP issue. I'm default so RP and so row
id are not ordered !!! And as it's recommanded, i would like to stay RP
> 
> So other possibility is addind a dimension with super column as column are ordered in
RP
> 
> index{
> docs_shared{ //docs shared by users ordered by time
>     'time:id_user:id_doc' 
>     {
>         'time':'123456' //index on it
>         'id_user':'123' //index on it
>         'c_type':'BOT' //index on it
>         'id_doc':'123' 
>     }
> } 
> }
> 
> BUT.... sexondary index is not possible on SC -> C
> 
> 
> So next possibility is
> 
> index{
> docs_shared_time_c_type_id_user{ //docs shared by users ordered by time:c_type:id_user
>     'time:c_type:id_user:id_doc' : 'id_doc'
> } 
> docs_shared_c_type_time_id_user{ //docs shared by users ordered by time:id_user:c_type
>     'c_type:time:id_user:id_do' : 'id_doc'
> } 
> ... (there is 6 combinations of time c_type id_user)
> }
> 
> Like that i can list with keystart and keyend and filters
> 
> Example :
> 
> No filter : index -> time:c_type:id_user
> Filter on c_type :  index -> c_type:time:id_user
> Filter on id_user :  index -> id_user:time:c_type
> Filter on c_type and id_user : index -> id_user:c_type:time
> 
> Fortunately cassandra likes writing !!! (Ironic inside)
> 
> 
> So i have a question : i've readed that secondary index on SC->C will maybe arrive
in next releases... Is this information true ? And is it already planned ?
> 
> 
> Thank you,
> 
> Sébastien,
> 
> 2011/3/2 Burc Sade <burcsade@gmail.com>
> You can use PHP Solr Extension. It is a fully featured and light-weight client.
> 
> http://www.php.net/manual/en/book.solr.php
> 
> Without the secondary indexes on columns in CFs within SCFs, the best approach is to
create query-specific CFs at the moment. In the end all comes down to how simple you can make
your queries to have a minimum CF count.
> 
> Regards,
> Burc
> 
> On Wed, Mar 2, 2011 at 9:06 AM, Vodnok <vodnok@gmail.com> wrote:
> I think too via Solr it'll be easier. Just need to google it. (if you have links about
Solr in php...)
> 
> I realize that i have to remove some dimension to my CF...
> 
> I thought it was possible to have SCF -> CF -> SC -> C:value having secondary
index on C but has i understood, secondary index on C on super is not possible for now (but
will be maybe in next version)
> As i understand it's better to have more less complex CF then less more complex CF
> 
> Thank you for your reply,
> 
> 
> 
> 2011/3/2 Burc Sade <burcsade@gmail.com>
> 
> Hi Vodnok,
> 
> For tag searches I would use a search engine like Solr (Lucene), as I think it would
be more flexible to query. You can update the index as new data comes in and query it for
queries #1, #2 and #4.
> 
> For "All doc of type='BOT' and c_bot_code='ABC'" query, I would create the CF below.
> 
> doc_types
> {
>    'BOT:ABC':
>   {
>     <docid>: <creation_date?> 
>   } 
> }
> 
> You can assign a value you are going to need when after querying to the docid. The problem
with this schema is that if there are not many type:c_bot_code combinations, there will be
many columns under each key in this CF. If a combination has much much more columns than others,
hot spot problem may arise.
> 
> 
> 
> On Tue, Mar 1, 2011 at 11:39 PM, Vodnok <vodnok@gmail.com> wrote:
> Hi,
> 
> Totaly newbie on Cassandra (with phpcassa) with big background on relationned database,
i'm would like to use Cassandra for a trivial case. So i'm on it since 3 days. Sorry for my
stupid question. I'm pretty sure i'm wrong but i want to learn so i'm here
> 
> 
> I would like your advise on a design for cassandra.
> 
> 
> Case:
> 
> - Users created Docs and can share docs with friends
> - Users can read and share docs of their friends with other friends
> - Docs can be of different type [text;picture;video;etc]
> - Docs can be taggued
> 
> 
> 
> Typical queries :
> 
> 
> - Doc relative to tag
> - Doc relative to mutiple tags
> - Doc readed by user x
> - Doc relative to tag and ratio readed_shared greater than x (see design)
> - All doc of type='IMG' favorized by my friend
> - All doc of type='BOT' and c_bot_code='ABC'
> - All doc of type='BOT' favorized by my friend relative (tag) with 'fire' and 'belgium'
?
> 
> 
> 
> Design :
> 
> 
> docs // all docs
> {
>     ‘123456’: //id_docs
>     {
>         ‘t_info’: 
> 		{
> 			'c_type':'BOT'
> 			'b_del':'y'
> 			'b_publish':'y'
> 		}
> 		't_info_type':
> 		{
> 			'l_title':'Hello World!'
> 			'c_bot_code':'ABC'
> 		}
> 		't_read_user' : //read by user x
> 		{
> 			//time + id_user
> 			'123456789_123':'123'
> 			'123456789_155':'155'			
> 		}
> 		't_shared_user' : //share by user x
> 		{
> 			//time + id_user
> 			'123456789_123':'123'
> 			'123456789_155':'155'			
> 		}
> 		't_tags'
> 		{
> 			'fire':'fire'
> 			'belgium':'belgium'
> 		}
> 		't_stats'
> 		{
> 			'n_readed':'60'
> 			'n_shared':'6'
> 			'n_ratio_readed_shared':'0.1'			
> 		}
> 	}
> }
> 
> 
> tags_docs // all tag linked to docs
> {
> 	'fire'://tag
> 	{
> 		//creation_time + id_docs
> 		'456789_123456':
> 		{
> 			'id_doc':'123456'
> 			'time':'456789'
> 		}
> 		'456789_223456':'223456':
> 		{
> 			'id_doc':'123456'
> 			'time':'456789'
> 		}
> 		'456789_323456':'223456':
> 		{
> 			'id_doc':'123456'
> 			'time':'456789'
> 		}
> 	}
> 	'belgium':
> 	{
> 		...
> 	}	
> }
> 
> 
> users // all users
> {
>     ‘123’: //id_user
>     {
>         ‘t_info’: 
> 		{
> 			l_name:'Boris'
> 			c_lang='fr'
> 
> 		}
> 		't_readed_docs':
> 		{
> 			//time + id_doc
> 			'123456789_123456':'123456'
> 			'123458789_136456':'136456'
> 		}
> 		't_shared_docs':
> 		{
> 			//time + id_doc
> 			'123456789_123456':'123456'
> 			'123458789_136456':'136456'
> 		}	
> 	}	
> }
> 
> 
> users_docs // all action by users on docs
> {
>     ‘123_123456’: // id_user + id_doc
>     {
> 		'id_doc':'123456'
> 		'id_user':'123'
> 		'd_readed':'20110301'
> 		'd_shared':'20110301'
> 	}
> }
> 
> 
> user_friends_act // all activity of user friends
> {
>     ‘123’:// id_user
>     {
> 		't_readed_docs': //all docs readed by my friends
> 		{
> 			'223456_224_123456': // time + id_friend + id_docs
> 			{
> 				'id_friend':'224'
> 				'id_docs':'123456'				
> 				'time':'223456'
> 				'c_type='BOT'	
> 			}
> 		}
> 		't_shared_docs': //all docs shared by my friends
> 		{
> 			'223456_224_123456': // time + id_friend + id_docs
> 			{
> 				'id_friend':'224'
> 				'id_docs':'123456'				
> 				'time':'223456'
> 				'c_type='BOT'	
> 			}
> 		}
> 	}
> }
> 
> 
> 
> I know that certain queries are not possible for now like : - All doc of type='BOT' favorized
by my friend relative (tag) with 'fire' and 'belgium' ?
> 
> 
> 
> What do you think ?
> 
> 
> Thank you,
> 
> 
> Vodnok,
> 
> 
> (Please remember i'm on cassandra since 3 days)
> 
> 
> 
> 


Mime
View raw message