incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Primary/secondary index question / best practices?
Date Tue, 11 Dec 2012 23:39:43 GMT
Oh, and one last thingŠthere is no limit on number of partitions, just on
partition size really.

Dean

On 12/11/12 4:26 PM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>Is there any column that would be a good qualifer as a partition key?
>
>Some people partition by time like every month or every day, and then you
>can either have your own secondary indexes that you query into(high
>entropy is NOT a big deal here) or PlayOrm can do some for you or you
>could use CQL as well.
>
>Other partitioning schemes are to partition by client.
>
>The goal is to have less than probably about 5 million rows in a
>partition so your wide row index is not too large.
>
>
>Dean
>
>From: 
>"Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.co
>m>" 
><Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.co
>m>>
>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Date: Tuesday, December 11, 2012 3:45 PM
>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
>Subject: RE: Primary/secondary index question / best practices?
>
>
>Dean, thank you for your response.  To the second half of the query, I¹m
>a little concerned about the secondary index approach since the indexes
>that I want to create are columns with high entropy.
>
>
>
>For example, I would like to query by User name and IP address, values
>which are decidedly NOT like the pattern recommended in the Secondary
>Index field.   The 8-10 columns I need to search by are all high a
>similar scatter rate.  Since the documentation seems to suggest that this
>is a bad idea, what would the correct pattern look like?
>
>
>
>In an RDBMS I would just slap an alternate key index on the table and let
>it roll.   It seems like maybe that is not the right approach for
>Cassandra?
>
>
>
>Thanks again,
>
>Steve
>
>
>
>-----Original Message-----
>From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]
>Sent: Tuesday, December 11, 2012 4:57 PM
>To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
>Subject: Re: Primary/secondary index question / best practices?
>
>
>
>Hard to help out on a design without specifics but here is some advice
>based on the limited information
>
>
>
>Primary key : yes, must be cluster unique.  TimeUUID or UUIDŠ.PlayOrm has
>very unique TimeUUID like keys as in this one 7AL2S8Y.b1 (b1 is the
>hostname and the prefix is a "unique" timestamp but generated to a
>shorter string(ah, nice readable primary keys).
>
>
>
>There are some patterns you can look into here that may help
>https://github.com/deanhiller/playorm/wiki/Patterns-Page
>
>
>
>If you can partition your data virtually, it may help a lot so you can
>query into the partitions.
>
>
>
>Later,
>
>Dean
>
>
>
>From: 
>"Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.co
>m><mailto:Stephen.M.Thompson@wellsfargo.com%3cmailto:Stephen.M.Thompson@we
>llsfargo.com%3e>" 
><Stephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.co
>m<mailto:Stephen.M.Thompson@wellsfargo.com%3cmailto:Stephen.M.Thompson@wel
>lsfargo.com>>>
>
>Reply-To: 
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org%3cmailto:user@cassandra.apache.org%3e>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org<mailto:user@ca
>ssandra.apache.org%3cmailto:user@cassandra.apache.org>>>
>
>Date: Tuesday, December 11, 2012 2:49 PM
>
>To: 
>"user@cassandra.apache.org<mailto:user@cassandra.apache.org><mailto:user@c
>assandra.apache.org%3cmailto:user@cassandra.apache.org%3e>"
><user@cassandra.apache.org<mailto:user@cassandra.apache.org<mailto:user@ca
>ssandra.apache.org%3cmailto:user@cassandra.apache.org>>>
>
>Subject: Primary/secondary index question / best practices?
>
>
>
>m my reading, it seems like I need a UUID column that will be my primary
>index, and then I should set up secondary indexes on the 8-10 primary
>search columns.  Am I understanding this correctly?  Any advice you can
>offer on this would be tremendously helpful.  I¹m quite limited in how
>specific I can be about the data, of course.


Mime
View raw message