cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leleu Eric <Eric.Le...@worldline.com>
Subject RE: Secondary index or dedicated CF?
Date Fri, 22 Aug 2014 17:46:36 GMT
Thanks you for your feedbacks.

De : Mark Reddy [mailto:mark.l.reddy@gmail.com]
Envoyé : vendredi 22 août 2014 17:08
À : user@cassandra.apache.org
Objet : Re: Secondary index or dedicated CF?

Hi,

As a general rule of thumb I would steer clear of secondary indexes, this is also the official
stand that DataStax take (see p5 of their best practices doc: http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf).

"It is best to avoid using Cassandra's built-in secondary indexes where possible. Instead,
it is recommended to denormalize data and manually maintain a dynamic table as a form of an
index instead of using a secondary index. If and when secondary indexes are to be used, they
should be created only on columns containing low-cardinality data (for example: fields with
less than 1000 states)."

Mark

On 22 Aug 2014, at 15:58, DuyHai Doan <doanduyhai@gmail.com<mailto:doanduyhai@gmail.com>>
wrote:


Hello Eric

"Under the hood what is the difference of the both solutions?"

 1. Cassandra secondary index: distributed index, supports better high volume of data, the
index itself is distributed so there is no bottleneck. The tradeoff is that depending on the
cardinality of data having the same "bucketname+tenantID" the performance may drop sharply.
Please read this: http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__when-no-index.
There are several restrictions to secondary index

2. Manual index: easy to design, but potentially wide row and not well balance if  data having
the same "bucketname+tenantID" is very large. Furthermore you need to manage index consistency
manually so that it is synced with source data updates.

 The best thing to do is to benchmark both solutions and takes the approach giving you the
best results. Be careful with benchmarks, it should be representative of the data pattern
you likely have in production.

On Fri, Aug 22, 2014 at 7:47 AM, Leleu Eric <Eric.Leleu@worldline.com<mailto:Eric.Leleu@worldline.com>>
wrote:
Hi,


I'm new with Cassandra and I wondering what is the best design for my case.

I have a set of buckets that contain one or thousands of contents.

Here is my Content CF :

    CREATE TABLE IF NOT EXISTS contents (tenantID varchar,
     key varchar,
    type varchar,
    bucket varchar,
    owner varchar,
    workspace varchar,
     public_read boolean  PRIMARY KEY ((key, tenantID), type, workspace));


To retrieve all contents that belong to a bucket, I have created an index on the bucket column.

    CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket);

The column value "bucket" is concatenated with the tenantId (bucket = bucketname+tenantID)
in order to avoid filtering on the tenantID on my application.

Is it the rights way to do or should I create another column family to link each content to
the bucket ?

    CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar,
     key varchar,
    type varchar,
    bucket varchar,
    owner varchar,
    workspace varchar,
     public_read boolean  PRIMARY KEY ((bucket, tenantID), key));

Under the hood what is the difference of the both solutions?

According to my understanding, the result will be the same. Both will have the rowkey equals
to the "bucketname"  and the "tenantID".
Excepted that the secondary index can have a replication delay...

Can you help me on this point?

Regards,
Eric


________________________________

Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de
ses destinataires. Il peut également être protégé par le secret professionnel. Si vous
recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire.
L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline
ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne
aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout
dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee;
it may also be privileged. If you receive this e-mail in error, please notify the sender immediately
and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability
cannot be triggered for the message content. Although the sender endeavours to maintain a
computer virus-free network, the sender does not warrant that this transmission is virus-free
and will not be liable for any damages resulting from any virus transmitted.



________________________________

Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de
ses destinataires. Il peut également être protégé par le secret professionnel. Si vous
recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire.
L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline
ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne
aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout
dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee;
it may also be privileged. If you receive this e-mail in error, please notify the sender immediately
and destroy it. As its integrity cannot be secured on the Internet, the Worldline liability
cannot be triggered for the message content. Although the sender endeavours to maintain a
computer virus-free network, the sender does not warrant that this transmission is virus-free
and will not be liable for any damages resulting from any virus transmitted.

Mime
View raw message