cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Brown <jasedbr...@gmail.com>
Subject Re: Reg:- Cassandra Data modelling for Search
Date Mon, 12 Jun 2017 11:57:52 GMT
removing dev@ from this conversation, as the thread is more appropriately
for user@

On Mon, Jun 12, 2017 at 4:51 AM, Eduardo Alonso <eduardoalonso@stratio.com>
wrote:

> -Virtual tokens are not recommended when using SOLR or
> cassandra-lucene-index.
>
> If you use your table schema you will not have any problem with partition
> size because your table is *not* a WIDE row table (it does not have
> clustering keys)
> The limit for 1 record with those 15 or 20 columns must not be larger that
> 100MB. You will have enough.
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*
>
> 2017-06-12 12:36 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
>
> > And due to single table videos, maybe it will go with around 15,20
> > columns, then we need to also think very carefully about partition sizes
> > also.
> >
> > On Mon, Jun 12, 2017 at 6:33 PM, @Nandan@ <nandanpriyadarshi298@gmail.
> com>
> > wrote:
> >
> >> Yes this is only Option I am also thinking like this as my second
> >> options. Before this I was thinking to do denormalize table based on
> search
> >> columns, but due to partial search this will be not that effective.
> >>
> >> Now suppose , if we are going with this single table as videos. and
> >> implemented with Solr/Lucene, then need to also care about num_tokens ?
> >>
> >>
> >> On Mon, Jun 12, 2017 at 6:27 PM, Eduardo Alonso <
> >> eduardoalonso@stratio.com> wrote:
> >>
> >>> Using cassandra collections
> >>>
> >>> CREATE TABLE videos (
> >>> videoid uuid primary key,
> >>> title text,
> >>> actor list<text>,
> >>> producer list<text>,
> >>> release_date timestamp,
> >>> description text,
> >>> music text,
> >>> etc...
> >>> );
> >>>
> >>> When using collection you need to take care of its length. Collections
> >>> are designed to store
> >>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_
> collections_c.html>only
> >>> a small amount of data
> >>> <http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_
> collections_c.html>
> >>> .
> >>> 5/10 actors per movie is ok.
> >>>
> >>>
> >>> Eduardo Alonso
> >>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> >>> 28224 Pozuelo de Alarcón, Madrid
> >>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
//
> *@stratiobd
> >>> <https://twitter.com/StratioBD>*
> >>>
> >>> 2017-06-12 11:54 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com>:
> >>>
> >>>> So In short we have to go with one single table as videos and put
> >>>> primary key as videoid uuid.
> >>>> But then how can we able to handle multiple actor name and producer
> >>>> name. ?
> >>>>
> >>>> On Mon, Jun 12, 2017 at 5:51 PM, Eduardo Alonso <
> >>>> eduardoalonso@stratio.com> wrote:
> >>>>
> >>>>> Yes, you are right.
> >>>>>
> >>>>> Table denormalization is useful just when you have unique primary
> >>>>> keys, not your case.
> >>>>> Denormalized tables are only different in its primary key, every
> >>>>> denormalized table contains all the data (it just change how it
is
> >>>>> structured). So, if you need to index it, do it with just one table
> (the
> >>>>> one you showed us with videoid as the primary key is ok).
> >>>>>
> >>>>> Solr, Elastic and cassandra-lucene-index are both based on Lucene
and
> >>>>> all of them fulfill all your needs.
> >>>>>
> >>>>> Solr (in DSE) and cassandra-lucene-index
> >>>>> <https://github.com/stratio/cassandra-lucene-index> are very
well
> >>>>> integrated with cassandra using its secondary index interface. If
you
> >>>>> choose elastic search you will need to code the integration (write
> mutex,
> >>>>> both cluster synchronization (imagine something written in cassandra
> but
> >>>>> failed to write in elastic))
> >>>>>
> >>>>> I know i am not the most suitable to recommend you to use our product
> >>>>> cassandra-lucene-index
> >>>>> <https://github.com/stratio/cassandra-lucene-index> but it
is open
> >>>>> source, just take a look.
> >>>>>
> >>>>> Eduardo Alonso
> >>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> >>>>> 28224 Pozuelo de Alarcón, Madrid
> >>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com
> // *@stratiobd
> >>>>> <https://twitter.com/StratioBD>*
> >>>>>
> >>>>> 2017-06-12 11:18 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.com
> >:
> >>>>>
> >>>>>> Hi Eduardo,
> >>>>>>
> >>>>>> And As we are trying to build an advanced search functionality
in
> >>>>>> which we can able to do partial search based on actor, producer,
> director,
> >>>>>> etc. columns.
> >>>>>> So if we do denormalization of tables then we have to create
tables
> >>>>>> such as below :-
> >>>>>> video_by_actor
> >>>>>> video_by_producer
> >>>>>> video_by_director
> >>>>>> video_by_date
> >>>>>> etc..
> >>>>>> By using denormalized, Cassandra only allows us to do equality
> >>>>>> search, but for implementing Partial search we need to implement
> solr on
> >>>>>> all above tables.
> >>>>>>
> >>>>>> This is my thinking, but I think this will be not correct way
to
> >>>>>> implement Apache Solr on all tables.
> >>>>>>
> >>>>>> On Mon, Jun 12, 2017 at 5:11 PM, @Nandan@ <
> >>>>>> nandanpriyadarshi298@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Edurado,
> >>>>>>>
> >>>>>>> As you mentioned queries 1-6 ,
> >>>>>>> In this condition, we have to proceed with a table like
as below :-
> >>>>>>> create table videos (
> >>>>>>> videoid uuid primary key,
> >>>>>>> title text,
> >>>>>>> actor text,
> >>>>>>> producer text,
> >>>>>>> release_date timestamp,
> >>>>>>> description text,
> >>>>>>> music text,
> >>>>>>> etc...
> >>>>>>> );
> >>>>>>> This table will help to store video datas based on PK videoid
and
> >>>>>>> will give uniqeness due to uuid.
> >>>>>>> But as we know , in one movie there are multiple actor,
multiple
> >>>>>>> producer, multiple music worked, So how can we store all
these..
> Only one
> >>>>>>> option will left as to use collection type columns.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Jun 12, 2017 at 4:59 PM, Eduardo Alonso <
> >>>>>>> eduardoalonso@stratio.com> wrote:
> >>>>>>>
> >>>>>>>> TLDR shouldBe *PD
> >>>>>>>>
> >>>>>>>> Eduardo Alonso
> >>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> >>>>>>>> 28224 Pozuelo de Alarcón, Madrid
> >>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073>
//
> www.stratio.com
> >>>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
> >>>>>>>>
> >>>>>>>> 2017-06-12 10:58 GMT+02:00 Eduardo Alonso <
> >>>>>>>> eduardoalonso@stratio.com>:
> >>>>>>>>
> >>>>>>>>> Hi Nandan:
> >>>>>>>>>
> >>>>>>>>> So, your system must provide these queries:
> >>>>>>>>>
> >>>>>>>>> 1 - SELECT video FROM ... WHERE actor = '...';
> >>>>>>>>> 2 - SELECT video FROM ... WHERE producer = '...';
> >>>>>>>>> 3 - SELECT video FROM ... WHERE music = '...';
> >>>>>>>>> 4 - SELECT video FROM ... WHERE actor = '...' AND
producer
> ='...';
> >>>>>>>>> 5 - SELECT video FROM ... WHERE actor = '...' AND
music = '...';
> >>>>>>>>> 6 - SELECT video WHERE title CONTAINS 'Harry';
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> For queries 1-5 you can get them with just cassandra,
> >>>>>>>>> denormalizing tables just the way your mentioned
but without
> solr, just
> >>>>>>>>> cassandra (Indeed, just for equality clauses)
> >>>>>>>>>
> >>>>>>>>> video_by_actor;
> >>>>>>>>> video_by_producer;
> >>>>>>>>> video_by_music;
> >>>>>>>>> video_by_actor_and_producer;
> >>>>>>>>> video_by_actor_and_music;
> >>>>>>>>>
> >>>>>>>>> For queries number 6 you need a search engine.
> >>>>>>>>>
> >>>>>>>>> SOL
> >>>>>>>>> ElasticSearch
> >>>>>>>>> cassandra-lucene-index
> >>>>>>>>> <https://github.com/stratio/cassandra-lucene-index>
> >>>>>>>>> SASI
> >>>>>>>>> <http://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/
> cql_commands/cqlCreateCustomIndex.html>
> >>>>>>>>>
> >>>>>>>>> I think, just for your query,  the easiest way to
get it is to
> >>>>>>>>> build a SASI index.
> >>>>>>>>> TLDR: I work for stratio in cassandra-lucene-index
but for your
> >>>>>>>>> basic query (only one dimension), SASI indexes will
work for you.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Eduardo Alonso
> >>>>>>>>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> >>>>>>>>> 28224 Pozuelo de Alarcón, Madrid
> >>>>>>>>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073>
//
> www.stratio.com
> >>>>>>>>>  // *@stratiobd <https://twitter.com/StratioBD>*
> >>>>>>>>>
> >>>>>>>>> 2017-06-12 9:50 GMT+02:00 @Nandan@ <nandanpriyadarshi298@gmail.
> com
> >>>>>>>>> >:
> >>>>>>>>>
> >>>>>>>>>> But Condition is , I am working with Apache
Cassandra Database
> in
> >>>>>>>>>> which I have to store my data into Cassandra
and then have to
> implement
> >>>>>>>>>> partial search capability.
> >>>>>>>>>> If we need to search based on full search  primary
key, then it
> >>>>>>>>>> really best and easy to work with Cassandra
, but in case of
> flexible
> >>>>>>>>>> search , I am getting confused.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jun 12, 2017 at 3:47 PM, Oskar Kjellin
<
> >>>>>>>>>> oskar.kjellin@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I haven't run solr with Cassandra myself.
I just meant to run
> >>>>>>>>>>> elasticsearch as a completely separate service
and write there
> as well.
> >>>>>>>>>>>
> >>>>>>>>>>> On 12 Jun 2017, at 09:45, @Nandan@ <
> >>>>>>>>>>> nandanpriyadarshi298@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Do you mean to use Elastic Search with Cassandra?
> >>>>>>>>>>> Even I am thinking to use Apache Solr With
Cassandra.
> >>>>>>>>>>> In that case I have to create distributed
tables such as:-
> >>>>>>>>>>> 1) video_by_title, video_by_actor, video_by_year
 etc..
> >>>>>>>>>>> 2) After creating Tables , will have to
configure solr core on
> >>>>>>>>>>> all tables.
> >>>>>>>>>>>
> >>>>>>>>>>> Is it like this ?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jun 12, 2017 at 3:19 PM, Oskar Kjellin
<
> >>>>>>>>>>> oskar.kjellin@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Why not elasticsearch for this use case?
It will make your
> life
> >>>>>>>>>>>> much simpler
> >>>>>>>>>>>>
> >>>>>>>>>>>> > On 12 Jun 2017, at 04:40, @Nandan@
<
> >>>>>>>>>>>> nandanpriyadarshi298@gmail.com> wrote:
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Hi,
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Currently, I am working on data
modeling for Video Company
> in
> >>>>>>>>>>>> which we have different types of users
as well as different
> user
> >>>>>>>>>>>> functionality.
> >>>>>>>>>>>> > But currently, my concern is about
Search video module based
> >>>>>>>>>>>> on different fields.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Query patterns are as below:-
> >>>>>>>>>>>> > 1) Select video by actor.
> >>>>>>>>>>>> > 2) select video by producer.
> >>>>>>>>>>>> > 3) select video by music.
> >>>>>>>>>>>> > 4) select video by actor and producer.
> >>>>>>>>>>>> > 5) select video by actor and music.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Note: - In short, We want to establish
an advanced search
> >>>>>>>>>>>> module by which we can search by anyway
and get the desired
> results.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > During a search , we need partial
search also such that if
> >>>>>>>>>>>> any user can search "Harry" title, then
we are able to give
> them result as
> >>>>>>>>>>>> all videos whose
> >>>>>>>>>>>> >  title contains "Harry" at any
location.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > As per my ideas, I have to create
separate tables such as
> >>>>>>>>>>>> video_by_actor, video_by_producer etc..
and implement solr
> query on all
> >>>>>>>>>>>> tables. Otherwise,
> >>>>>>>>>>>> > is there any others way by which
we can implement this
> search
> >>>>>>>>>>>> module effectively.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Please suggest.
> >>>>>>>>>>>> >
> >>>>>>>>>>>> > Best regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Mime
View raw message