Hi folks – I’m doing an informal proof-of-concept with Cassandra and I’ve been getting some conflicting information about how my data layout should go. Perhaps somebody could point me in the right direction.
I have a column family that will have billions of rows of data. The data do not have any unique identifier intrinsically. A given row will have, say, 50 columns, and I’ll need to be able to efficiently query on 8-10 of them.
I’ve been told that I should just pick the most common search item and make that my primary key, even though it will not be unique. That seems contrary to the documentation I am seeing online.
From my reading, it seems like I need a UUID column that will be my primary index, and then I should set up secondary indexes on the 8-10 primary search columns. Am I understanding this correctly? Any advice you can offer on this would be tremendously helpful. I’m quite limited in how specific I can be about the data, of course.