cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "@Nandan@" <nandanpriyadarshi...@gmail.com>
Subject Re: Reg:- Cassandra Data modelling for Search
Date Tue, 13 Jun 2017 01:05:25 GMT
Ok , Then let's try to implement and will check by using cassandra-stress
to check what will be performance.
I worked on another data model for book storage for my company, with same
situations like having 1 single table with 80 columns and primary key as
bookid uuid.  Implemented Solr on top of that.  That's why , I am try to
implement all possible best solution for upcoming projects.


On Mon, Jun 12, 2017 at 7:51 PM, Eduardo Alonso <eduardoalonso@stratio.com>
wrote:

> -Virtual tokens are not recommended when using SOLR or
> cassandra-lucene-index.
>
> If you use your table schema you will not have any problem with partition
> size because your table is *not* a WIDE row table (it does not have
> clustering keys)
> The limit for 1 record with those 15 or 20 columns must not be larger that
> 100MB. You will have enough.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*
>
> 2017-06-12 12:36 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
>
>> And due to single table videos, maybe it will go with around 15,20
>> columns, then we need to also think very carefully about partition sizes
>> also.
>>
>> On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ <nandanpriyadarshi298@gmail.com
>> > wrote:
>>
>>> Yes this is only Option I am also thinking like this as my second
>>> options. Before this I was thinking to do denormalize table based on search
>>> columns, but due to partial search this will be not that effective.
>>>
>>> Now suppose , if we are going with this single table as videos. and
>>> implemented with Solr/Lucene, then need to also care about num_tokens ?
>>>
>>>
>>> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
>>> eduardoalonso@stratio.com> wrote:
>>>
>>>> Using cassandra collections
>>>>
>>>> CREATE TABLE videos (
>>>> videoid uuid primary key,
>>>> title text,
>>>> actor list<text>,
>>>> producer list<text>,
>>>> release_date timestamp,
>>>> description text,
>>>> music text,
>>>> etc...
>>>> );
>>>>
>>>> When using collection you need to take care of its length. Collections
>>>> are designed to store
>>>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>only
>>>> a small amount of data
>>>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html>
>>>> .
>>>> 5/10 actors per movie is ok.
>>>>
>>>>
>>>> Eduardo Alonso
>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>> 28224 Pozuelo de Alarcón, Madrid
>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
// *@stratiobd
>>>> <https://twitter.com/StratioBD>*
>>>>
>>>> 2017-06-12 11:54 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
>>>>
>>>>> So In short we have to go with one single table as videos and put
>>>>> primary key as videoid uuid.
>>>>> But then how can we able to handle multiple actor name and producer
>>>>> name. ?
>>>>>
>>>>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
>>>>> eduardoalonso@stratio.com> wrote:
>>>>>
>>>>>> Yes, you are right.
>>>>>>
>>>>>> Table denormalization is useful just when you have unique primary
>>>>>> keys, not your case.
>>>>>> Denormalized tables are only different in its primary key, every
>>>>>> denormalized table contains all the data (it just change how it is
>>>>>> structured). So, if you need to index it, do it with just one table
(the
>>>>>> one you showed us with videoid as the primary key is ok).
>>>>>>
>>>>>> Solr, Elastic and cassandra-lucene-index are both based on Lucene
and
>>>>>> all of them fulfill all your needs.
>>>>>>
>>>>>> Solr (in DSE) and cassandra-lucene-index
>>>>>> <https://github.com/stratio/cassandra-lucene-index> are very
well
>>>>>> integrated with cassandra using its secondary index interface. If
you
>>>>>> choose elastic search you will need to code the integration (write
mutex,
>>>>>> both cluster synchronization (imagine something written in cassandra
but
>>>>>> failed to write in elastic))
>>>>>>
>>>>>> I know i am not the most suitable to recommend you to use our product
>>>>>> cassandra-lucene-index
>>>>>> <https://github.com/stratio/cassandra-lucene-index> but it
is open
>>>>>> source, just take a look.
>>>>>>
>>>>>> Eduardo Alonso
>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>
>>>>>> 2017-06-12 11:18 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
>>>>>>
>>>>>>> Hi Eduardo,
>>>>>>>
>>>>>>> And As we are trying to build an advanced search functionality
in
>>>>>>> which we can able to do partial search based on actor, producer,
director,
>>>>>>> etc. columns.
>>>>>>> So if we do denormalization of tables then we have to create
tables
>>>>>>> such as below :-
>>>>>>> video_by_actor
>>>>>>> video_by_producer
>>>>>>> video_by_director
>>>>>>> video_by_date
>>>>>>> etc..
>>>>>>> By using denormalized, Cassandra only allows us to do equality
>>>>>>> search, but for implementing Partial search we need to implement
solr on
>>>>>>> all above tables.
>>>>>>>
>>>>>>> This is my thinking, but I think this will be not correct way
to
>>>>>>> implement Apache Solr on all tables.
>>>>>>>
>>>>>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <
>>>>>>> nandanpriyadarshi298@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Edurado,
>>>>>>>>
>>>>>>>> As you mentioned queries 1-6 ,
>>>>>>>> In this condition, we have to proceed with a table like as
below :-
>>>>>>>> create table videos (
>>>>>>>> videoid uuid primary key,
>>>>>>>> title text,
>>>>>>>> actor text,
>>>>>>>> producer text,
>>>>>>>> release_date timestamp,
>>>>>>>> description text,
>>>>>>>> music text,
>>>>>>>> etc...
>>>>>>>> );
>>>>>>>> This table will help to store video datas based on PK videoid
and
>>>>>>>> will give uniqeness due to uuid.
>>>>>>>> But as we know , in one movie there are multiple actor, multiple
>>>>>>>> producer, multiple music worked, So how can we store all
these.. Only one
>>>>>>>> option will left as to use collection type columns.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
>>>>>>>> eduardoalonso@stratio.com> wrote:
>>>>>>>>
>>>>>>>>> TLDR shouldBe *PD
>>>>>>>>>
>>>>>>>>> Eduardo Alonso
>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073>
// www.stratio.com
>>>>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>>>>
>>>>>>>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso <
>>>>>>>>> eduardoalonso@stratio.com>:
>>>>>>>>>
>>>>>>>>>> Hi Nandan:
>>>>>>>>>>
>>>>>>>>>> So, your system must provide these queries:
>>>>>>>>>>
>>>>>>>>>> 1 - SELECT video FROM ... WHERE actor = '...';
>>>>>>>>>> 2 - SELECT video FROM ... WHERE producer = '...';
>>>>>>>>>> 3 - SELECT video FROM ... WHERE music = '...';
>>>>>>>>>> 4 - SELECT video FROM ... WHERE actor = '...' AND
producer
>>>>>>>>>> ='...';
>>>>>>>>>> 5 - SELECT video FROM ... WHERE actor = '...' AND
music = '...';
>>>>>>>>>> 6 - SELECT video WHERE title CONTAINS 'Harry';
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For queries 1-5 you can get them with just cassandra,
>>>>>>>>>> denormalizing tables just the way your mentioned
but without solr, just
>>>>>>>>>> cassandra (Indeed, just for equality clauses)
>>>>>>>>>>
>>>>>>>>>> video_by_actor;
>>>>>>>>>> video_by_producer;
>>>>>>>>>> video_by_music;
>>>>>>>>>> video_by_actor_and_producer;
>>>>>>>>>> video_by_actor_and_music;
>>>>>>>>>>
>>>>>>>>>> For queries number 6 you need a search engine.
>>>>>>>>>>
>>>>>>>>>> SOL
>>>>>>>>>> ElasticSearch
>>>>>>>>>> cassandra-lucene-index
>>>>>>>>>> <https://github.com/stratio/cassandra-lucene-index>
>>>>>>>>>> SASI
>>>>>>>>>> <http://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cql_commands/cqlCreateCustomIndex.html>
>>>>>>>>>>
>>>>>>>>>> I think, just for your query,  the easiest way to
get it is to
>>>>>>>>>> build a SASI index.
>>>>>>>>>> TLDR: I work for stratio in cassandra-lucene-index
but for your
>>>>>>>>>> basic query (only one dimension), SASI indexes will
work for you.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Eduardo Alonso
>>>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>>>>>>>>>> 28224 Pozuelo de Alarcón, Madrid
>>>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073>
//
>>>>>>>>>> www.stratio.com // *@stratiobd <https://twitter.com/StratioBD>*
>>>>>>>>>>
>>>>>>>>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.co
>>>>>>>>>> m>:
>>>>>>>>>>
>>>>>>>>>>> But Condition is , I am working with Apache Cassandra
Database
>>>>>>>>>>> in which I have to store my data into Cassandra
and then have to implement
>>>>>>>>>>> partial search capability.
>>>>>>>>>>> If we need to search based on full search  primary
key, then it
>>>>>>>>>>> really best and easy to work with Cassandra ,
but in case of flexible
>>>>>>>>>>> search , I am getting confused.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin
<
>>>>>>>>>>> oskar.kjellin@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I haven't run solr with Cassandra myself.
I just meant to run
>>>>>>>>>>>> elasticsearch as a completely separate service
and write there as well.
>>>>>>>>>>>>
>>>>>>>>>>>> On 12 Jun 2017, at 09:45, @Nandan@ <
>>>>>>>>>>>> nandanpriyadarshi298@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Do you mean to use Elastic Search with Cassandra?
>>>>>>>>>>>> Even I am thinking to use Apache Solr With
Cassandra.
>>>>>>>>>>>> In that case I have to create distributed
tables such as:-
>>>>>>>>>>>> 1) video_by_title, video_by_actor, video_by_year
 etc..
>>>>>>>>>>>> 2) After creating Tables , will have to configure
solr core on
>>>>>>>>>>>> all tables.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it like this ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin
<
>>>>>>>>>>>> oskar.kjellin@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Why not elasticsearch for this use case?
It will make your
>>>>>>>>>>>>> life much simpler
>>>>>>>>>>>>>
>>>>>>>>>>>>> > On 12 Jun 2017, at 04:40, @Nandan@
<
>>>>>>>>>>>>> nandanpriyadarshi298@gmail.com> wrote:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Hi,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Currently, I am working on data
modeling for Video Company
>>>>>>>>>>>>> in which we have different types of users
as well as different user
>>>>>>>>>>>>> functionality.
>>>>>>>>>>>>> > But currently, my concern is about
Search video module based
>>>>>>>>>>>>> on different fields.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Query patterns are as below:-
>>>>>>>>>>>>> > 1) Select video by actor.
>>>>>>>>>>>>> > 2) select video by producer.
>>>>>>>>>>>>> > 3) select video by music.
>>>>>>>>>>>>> > 4) select video by actor and producer.
>>>>>>>>>>>>> > 5) select video by actor and music.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Note: - In short, We want to establish
an advanced search
>>>>>>>>>>>>> module by which we can search by anyway
and get the desired results.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > During a search , we need partial
search also such that if
>>>>>>>>>>>>> any user can search "Harry" title, then
we are able to give them result as
>>>>>>>>>>>>> all videos whose
>>>>>>>>>>>>> >  title contains "Harry" at any location.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > As per my ideas, I have to create
separate tables such as
>>>>>>>>>>>>> video_by_actor, video_by_producer etc..
and implement solr query on all
>>>>>>>>>>>>> tables. Otherwise,
>>>>>>>>>>>>> > is there any others way by which
we can implement this
>>>>>>>>>>>>> search module effectively.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Please suggest.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Best regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message