cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduardo Alonso <eduardoalo...@stratio.com>
Subject Re: Reg:- Cassandra Data modelling for Search
Date Mon, 12 Jun 2017 10:27:56 GMT
Using cassandra collections

CREATE TABLE videos (
videoid uuid primary key,
title text,
actor list<text>,
producer list<text>,
release_date timestamp,
description text,
music text,
etc...
);

When using collection you need to take care of its length. Collections are
designed to store
<http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>only
a small amount of data
<http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>.
5/10 actors per movie is ok.


Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

2017-06-12 11:54 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:

> So In short we have to go with one single table as videos and put primary
> key as videoid uuid.
> But then how can we able to handle multiple actor name and producer name.
> ?
>
> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <eduardoalonso@stratio.com
> > wrote:
>
>> Yes, you are right.
>>
>> Table denormalization is useful just when you have unique primary keys,
>> not your case.
>> Denormalized tables are only different in its primary key, every
>> denormalized table contains all the data (it just change how it is
>> structured). So, if you need to index it, do it with just one table (the
>> one you showed us with videoid as the primary key is ok).
>>
>> Solr, Elastic and cassandra-lucene-index are both based on Lucene and all
>> of them fulfill all your needs.
>>
>> Solr (in DSE) and cassandra-lucene-index
>> <https://github.com/stratio/cassandra-lucene-index> are very well
>> integrated with cassandra using its secondary index interface. If you
>> choose elastic search you will need to code the integration (write mutex,
>> both cluster synchronization (imagine something written in cassandra but
>> failed to write in elastic))
>>
>> I know i am not the most suitable to recommend you to use our product
>> cassandra-lucene-index
>> <https://github.com/stratio/cassandra-lucene-index> but it is open
>> source, just take a look.
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // *@stratiobd
>> <https://twitter.com/StratioBD>*
>>
>> 2017-06-12 11:18 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
>>
>>> Hi Eduardo,
>>>
>>> And As we are trying to build an advanced search functionality in which
>>> we can able to do partial search based on actor, producer, director, etc.
>>> columns.
>>> So if we do denormalization of tables then we have to create tables such
>>> as below :-
>>> video_by_actor
>>> video_by_producer
>>> video_by_director
>>> video_by_date
>>> etc..
>>> By using denormalized, Cassandra only allows us to do equality search,
>>> but for implementing Partial search we need to implement solr on all above
>>> tables.
>>>
>>> This is my thinking, but I think this will be not correct way to
>>> implement Apache Solr on all tables.
>>>
>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <nandanpriyadarshi298@gmail.co
>>> m> wrote:
>>>
>>>> Hi Edurado,
>>>>
>>>> As you mentioned queries 1-6 ,
>>>> In this condition, we have to proceed with a table like as below :-
>>>> create table videos (
>>>> videoid uuid primary key,
>>>> title text,
>>>> actor text,
>>>> producer text,
>>>> release_date timestamp,
>>>> description text,
>>>> music text,
>>>> etc...
>>>> );
>>>> This table will help to store video datas based on PK videoid and will
>>>> give uniqeness due to uuid.
>>>> But as we know , in one movie there are multiple actor, multiple
>>>> producer, multiple music worked, So how can we store all these.. Only one
>>>> option will left as to use collection type columns.
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
>>>> eduardoalonso@stratio.com> wrote:
>>>>
>>>>> TLDR shouldBe *PD
>>>>>
>>>>> Eduardo Alonso
>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
// *@stratiobd
>>>>> <https://twitter.com/StratioBD>*
>>>>>
>>>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso <eduardoalonso@stratio.com>:
>>>>>
>>>>>> Hi Nandan:
>>>>>>
>>>>>> So, your system must provide these queries:
>>>>>>
>>>>>> 1 - SELECT video FROM ... WHERE actor = '...';
>>>>>> 2 - SELECT video FROM ... WHERE producer = '...';
>>>>>> 3 - SELECT video FROM ... WHERE music = '...';
>>>>>> 4 - SELECT video FROM ... WHERE actor = '...' AND producer ='...';
>>>>>> 5 - SELECT video FROM ... WHERE actor = '...' AND music = '...';
>>>>>> 6 - SELECT video WHERE title CONTAINS 'Harry';
>>>>>>
>>>>>>
>>>>>> For queries 1-5 you can get them with just cassandra, denormalizing
>>>>>> tables just the way your mentioned but without solr, just cassandra
>>>>>> (Indeed, just for equality clauses)
>>>>>>
>>>>>> video_by_actor;
>>>>>> video_by_producer;
>>>>>> video_by_music;
>>>>>> video_by_actor_and_producer;
>>>>>> video_by_actor_and_music;
>>>>>>
>>>>>> For queries number 6 you need a search engine.
>>>>>>
>>>>>> SOL
>>>>>> ElasticSearch
>>>>>> cassandra-lucene-index
>>>>>> <https://github.com/stratio/cassandra-lucene-index>
>>>>>> SASI
>>>>>> <http://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cql_commands/cqlCreateCustomIndex.html>
>>>>>>
>>>>>> I think, just for your query,  the easiest way to get it is to build
>>>>>> a SASI index.
>>>>>> TLDR: I work for stratio in cassandra-lucene-index but for your basic
>>>>>> query (only one dimension), SASI indexes will work for you.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Eduardo Alonso
>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>
>>>>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
>>>>>>
>>>>>>> But Condition is , I am working with Apache Cassandra Database
in
>>>>>>> which I have to store my data into Cassandra and then have to
implement
>>>>>>> partial search capability.
>>>>>>> If we need to search based on full search  primary key, then
it
>>>>>>> really best and easy to work with Cassandra , but in case of
flexible
>>>>>>> search , I am getting confused.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin <
>>>>>>> oskar.kjellin@gmail.com> wrote:
>>>>>>>
>>>>>>>> I haven't run solr with Cassandra myself. I just meant to
run
>>>>>>>> elasticsearch as a completely separate service and write
there as well.
>>>>>>>>
>>>>>>>> On 12 Jun 2017, at 09:45, @Nandan@ <nandanpriyadarshi298@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Do you mean to use Elastic Search with Cassandra?
>>>>>>>> Even I am thinking to use Apache Solr With Cassandra.
>>>>>>>> In that case I have to create distributed tables such as:-
>>>>>>>> 1) video_by_title, video_by_actor, video_by_year  etc..
>>>>>>>> 2) After creating Tables , will have to configure solr core
on all
>>>>>>>> tables.
>>>>>>>>
>>>>>>>> Is it like this ?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin <
>>>>>>>> oskar.kjellin@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Why not elasticsearch for this use case? It will make
your life
>>>>>>>>> much simpler
>>>>>>>>>
>>>>>>>>> > On 12 Jun 2017, at 04:40, @Nandan@ <
>>>>>>>>> nandanpriyadarshi298@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > Currently, I am working on data modeling for Video
Company in
>>>>>>>>> which we have different types of users as well as different
user
>>>>>>>>> functionality.
>>>>>>>>> > But currently, my concern is about Search video
module based on
>>>>>>>>> different fields.
>>>>>>>>> >
>>>>>>>>> > Query patterns are as below:-
>>>>>>>>> > 1) Select video by actor.
>>>>>>>>> > 2) select video by producer.
>>>>>>>>> > 3) select video by music.
>>>>>>>>> > 4) select video by actor and producer.
>>>>>>>>> > 5) select video by actor and music.
>>>>>>>>> >
>>>>>>>>> > Note: - In short, We want to establish an advanced
search module
>>>>>>>>> by which we can search by anyway and get the desired
results.
>>>>>>>>> >
>>>>>>>>> > During a search , we need partial search also such
that if any
>>>>>>>>> user can search "Harry" title, then we are able to give
them result as all
>>>>>>>>> videos whose
>>>>>>>>> >  title contains "Harry" at any location.
>>>>>>>>> >
>>>>>>>>> > As per my ideas, I have to create separate tables
such as
>>>>>>>>> video_by_actor, video_by_producer etc.. and implement
solr query on all
>>>>>>>>> tables. Otherwise,
>>>>>>>>> > is there any others way by which we can implement
this search
>>>>>>>>> module effectively.
>>>>>>>>> >
>>>>>>>>> > Please suggest.
>>>>>>>>> >
>>>>>>>>> > Best regards,
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message