incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Stephen.M.Thomp...@wellsfargo.com>
Subject Primary/secondary index question / best practices?
Date Tue, 11 Dec 2012 21:49:41 GMT
Hi folks - I'm doing an informal proof-of-concept with Cassandra and I've been getting some
conflicting information about how my data layout should go.  Perhaps somebody could point
me in the right direction.

I have a column family that will have billions of rows of data.  The data do not have any
unique identifier intrinsically.  A given row will have, say, 50 columns, and I'll need to
be able to efficiently query on 8-10 of them.

I've been told that I should just pick the most common search item and make that my primary
key, even though it will not be unique.  That seems contrary to the documentation I am seeing
online.

>From my reading, it seems like I need a UUID column that will be my primary index, and
then I should set up secondary indexes on the 8-10 primary search columns.  Am I understanding
this correctly?  Any advice you can offer on this would be tremendously helpful.  I'm quite
limited in how specific I can be about the data, of course.

Steve

Mime
View raw message