incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Index and schema review
Date Wed, 08 Feb 2012 19:57:22 GMT
> 1.       Are the indexes local? i.e if node 1 holds say 10 keys that will only have indexes
for theses 10 keys. In short – interested in knowing how is the index partitioned?
Yes, nodes only hold the secondary indexes for the rows they are a replica for. This means
it's token range and the token range for the other nodes it shares ranges with.

IMHO you should try to model the well known requests without using secondary indexes. This
is not always possible but it will give the best performance.

A lot depends on the shape of the data, but I would think about:

Partitioning time series data http://www.slideshare.net/mattdennis/cassandra-data-modeling
Using composite columns to store all the B's and C's in the same row as the A. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/02/2012, at 10:11 PM, Tiwari, Dushyant wrote:

> Hi Cassandra Users,
>  
> I am considering Cassandra as the main data store. Just a quick description of the entity
structure that we are looking at to store/design schema for. So we have 3 entities A,B and
C. A has one to many relationship with B and similar is true for B and C. Hence this gives
us a tree like structure with A as a root. To store this structure in the Cassandra I am looking
at the following col family schema design.
>  
>  
> Col Family : For storing A
> Id of A - Key
> Date – Indexed
> Byte format of A object
>  
> Col Family : For storing B
> Id of B - Key
> Date – Indexed field
> Id of A to which B belongs – Indexed
> Byte format of B object
>  
> Col Family : For storing C
> Id of C - Key
> Date – Indexed field
> Id of A to which C belongs – Indexed
> Id of B to which C belongs – Indexed
> Byte format of C object
>  
>  
> Maintaining an index on date because caches are supposed to be preloaded with say 3 days
worth of data. Now the questions are
> 1.       Are the indexes local? i.e if node 1 holds say 10 keys that will only have indexes
for theses 10 keys. In short – interested in knowing how is the index partitioned?
> n  Just to appreciate the concern I have consider the case where we receive 100 keys
and we have 10 Cassandra nodes. Assuming even distribution of 10 keys each node. Will there
be 10 partitions of the index on 10 different nodes – only of the keys it owns?
>  
> 2.       Opinion about the schema design above. To list down the use cases –
> n  Preload the processes on start up with 3 days of data. (The data store should hold
data which dates as back as 3 years)
> n   Given Id of A get all the B’s and C’s of the tree.
>  
>  
> Hoping to hear soon.
>  
> Thanks and Regards,
> Dushyant
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views
contained herein are not intended to be, and do not constitute, advice within the meaning
of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have
received this communication in error, please destroy all electronic and paper copies and notify
the sender immediately. Mistransmission is not intended to waive confidentiality or privilege.
Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor
electronic communications. This message is subject to terms available at the following link:
http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify
us by reply message and we will send the contents to you. By messaging with Morgan Stanley
you consent to the foregoing.


Mime
View raw message