incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Primary/secondary index question / best practices?
Date Tue, 11 Dec 2012 23:26:13 GMT
Is there any column that would be a good qualifer as a partition key?

Some people partition by time like every month or every day, and then you can either have
your own secondary indexes that you query into(high entropy is NOT a big deal here) or PlayOrm
can do some for you or you could use CQL as well.

Other partitioning schemes are to partition by client.

The goal is to have less than probably about 5 million rows in a partition so your wide row
index is not too large.


Dean

From: "Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>"
<Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, December 11, 2012 3:45 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Primary/secondary index question / best practices?


Dean, thank you for your response.  To the second half of the query, I’m a little concerned
about the secondary index approach since the indexes that I want to create are columns with
high entropy.



For example, I would like to query by User name and IP address, values which are decidedly
NOT like the pattern recommended in the Secondary Index field.   The 8-10 columns I need to
search by are all high a similar scatter rate.  Since the documentation seems to suggest that
this is a bad idea, what would the correct pattern look like?



In an RDBMS I would just slap an alternate key index on the table and let it roll.   It seems
like maybe that is not the right approach for Cassandra?



Thanks again,

Steve



-----Original Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]
Sent: Tuesday, December 11, 2012 4:57 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Primary/secondary index question / best practices?



Hard to help out on a design without specifics but here is some advice based on the limited
information



Primary key : yes, must be cluster unique.  TimeUUID or UUID….PlayOrm has very unique TimeUUID
like keys as in this one 7AL2S8Y.b1 (b1 is the hostname and the prefix is a "unique" timestamp
but generated to a shorter string(ah, nice readable primary keys).



There are some patterns you can look into here that may help https://github.com/deanhiller/playorm/wiki/Patterns-Page



If you can partition your data virtually, it may help a lot so you can query into the partitions.



Later,

Dean



From: "Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com><mailto:Stephen.M.Thompson@wellsfargo.com%3cmailto:Stephen.M.Thompson@wellsfargo.com%3e>"
<Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com%3cmailto:Stephen.M.Thompson@wellsfargo.com>>>

Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org%3e>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org>>>

Date: Tuesday, December 11, 2012 2:49 PM

To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org%3e>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org<mailto:user@cassandra.apache.org%3cmailto:user@cassandra.apache.org>>>

Subject: Primary/secondary index question / best practices?



m my reading, it seems like I need a UUID column that will be my primary index, and then I
should set up secondary indexes on the 8-10 primary search columns.  Am I understanding this
correctly?  Any advice you can offer on this would be tremendously helpful.  I’m quite limited
in how specific I can be about the data, of course.

Mime
View raw message