From user-return-30569-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Dec 11 22:46:08 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F09CED984 for ; Tue, 11 Dec 2012 22:46:07 +0000 (UTC) Received: (qmail 13450 invoked by uid 500); 11 Dec 2012 22:46:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13420 invoked by uid 500); 11 Dec 2012 22:46:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13412 invoked by uid 99); 11 Dec 2012 22:46:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 22:46:05 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Stephen.M.Thompson@wellsfargo.com designates 167.138.239.98 as permitted sender) Received: from [167.138.239.98] (HELO mxdcmx01e.wellsfargo.com) (167.138.239.98) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 22:45:58 +0000 Received: from mxicmv02.wellsfargo.com (mxicmv02.wellsfargo.com [10.91.24.72]) by mxdcmx01e.wellsfargo.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qBBMjZcZ009455 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 11 Dec 2012 22:45:36 GMT DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wellsfargo.com; s=2011-05-wfb; t=1355265936; bh=6uLsawQC6RuHZtGHwcP6DzCpQNh2NvxEOQi/D2KJqsQ=; h=From:To:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=s6svcqd6p4jf16JZ+hVeVPVUUoDc92Gj8WPxs7dHqAbfV1kmcR6WyE3k+d5n872eU Id42bMJ+M2dubu/0mpnBegFCu7CNtRNiEidFUrEA/nl7KXISwmshnNt9hzBEKAi+ds C0p+pvvY2UTZUSaj0WE7vkMiIJcMfaqapp/ZL2d8= Received: from MSGEXSIL4001.ent.wfb.bank.corp (msgexsil4001.wellsfargo.com [170.13.178.17]) by mxicmv02.wellsfargo.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qBBMj9pw015038 for ; Tue, 11 Dec 2012 22:45:35 GMT Received: from MSGCMSIL1003.ent.wfb.bank.corp ([169.254.1.111]) by MSGEXSIL4001.ent.wfb.bank.corp ([170.13.178.17]) with mapi; Tue, 11 Dec 2012 17:45:23 -0500 From: To: Date: Tue, 11 Dec 2012 17:45:22 -0500 Subject: RE: Primary/secondary index question / best practices? Thread-Topic: Primary/secondary index question / best practices? Thread-Index: Ac3X6mqX40WYlaWCQRWTBLxqAFCijwABYvRg Message-ID: <333B362E7B77B344A2D0FD92840282611F7DCA67DD@MSGCMSIL1003.ent.wfb.bank.corp> References: <333B362E7B77B344A2D0FD92840282611F7DCA6759@MSGCMSIL1003.ent.wfb.bank.corp> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_333B362E7B77B344A2D0FD92840282611F7DCA67DDMSGCMSIL1003e_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_333B362E7B77B344A2D0FD92840282611F7DCA67DDMSGCMSIL1003e_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dean, thank you for your response. To the second half of the query, I'm a = little concerned about the secondary index approach since the indexes that = I want to create are columns with high entropy. For example, I would like to query by User name and IP address, values whic= h are decidedly NOT like the pattern recommended in the Secondary Index fie= ld. The 8-10 columns I need to search by are all high a similar scatter r= ate. Since the documentation seems to suggest that this is a bad idea, wha= t would the correct pattern look like? In an RDBMS I would just slap an alternate key index on the table and let i= t roll. It seems like maybe that is not the right approach for Cassandra? Thanks again, Steve -----Original Message----- From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov] Sent: Tuesday, December 11, 2012 4:57 PM To: user@cassandra.apache.org Subject: Re: Primary/secondary index question / best practices? Hard to help out on a design without specifics but here is some advice base= d on the limited information Primary key : yes, must be cluster unique. TimeUUID or UUID....PlayOrm has= very unique TimeUUID like keys as in this one 7AL2S8Y.b1 (b1 is the hostna= me and the prefix is a "unique" timestamp but generated to a shorter string= (ah, nice readable primary keys). There are some patterns you can look into here that may help https://github= .com/deanhiller/playorm/wiki/Patterns-Page If you can partition your data virtually, it may help a lot so you can quer= y into the partitions. Later, Dean From: "Stephen.M.Thompson@wellsfargo.com" >> Reply-To: "user@cassandra.apache.org" >> Date: Tuesday, December 11, 2012 2:49 PM To: "user@cassandra.apache.org" >> Subject: Primary/secondary index question / best practices? m my reading, it seems like I need a UUID column that will be my primary in= dex, and then I should set up secondary indexes on the 8-10 primary search = columns. Am I understanding this correctly? Any advice you can offer on t= his would be tremendously helpful. I'm quite limited in how specific I can= be about the data, of course. --_000_333B362E7B77B344A2D0FD92840282611F7DCA67DDMSGCMSIL1003e_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Dean, thank y= ou for your response.  To the second half of the query, I’m a li= ttle concerned about the secondary index approach since the indexes that I = want to create are columns with high entropy.

 

For example, I would = like to query by User name and IP address, values which are decidedly NOT l= ike the pattern recommended in the Secondary Index field.   The 8= -10 columns I need to search by are all high a similar scatter rate.  = Since the documentation seems to suggest that this is a bad idea, what woul= d the correct pattern look like?  

 

In an RDBMS I would j= ust slap an alternate key index on the table and let it roll.   I= t seems like maybe that is not the right approach for Cassandra?=

 

Th= anks again,

Steve

 

-----Origin= al Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]
Sen= t: Tuesday, December 11, 2012 4:57 PM
To: user@cassandra.apache.org
S= ubject: Re: Primary/secondary index question / best practices?

 

Hard to help o= ut on a design without specifics but here is some advice based on the limit= ed information

 

<= p class=3DMsoPlainText>Primary key : yes, must be cluster unique.  Tim= eUUID or UUID….PlayOrm has very unique TimeUUID like keys as in this = one 7AL2S8Y.b1 (b1 is the hostname and the prefix is a "unique" t= imestamp but generated to a shorter string(ah, nice readable primary keys).=

 

There are some patterns you can look into here that may help https://github.com/deanhiller/pl= ayorm/wiki/Patterns-Page

<= o:p> 

If you can partition your data = virtually, it may help a lot so you can query into the partitions.

 

= Later,

Dean

 

From: "St= ephen.M.Thompson@wellsfargo.com<mailto:Stephen.M.Thompson@wellsfargo.com= >" <Stephen.M.Thompson@wellsfargo.com<mailto:Steph= en.M.Thompson@wellsfargo.com>>

Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org= >" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>

Date: Tuesday, Decemb= er 11, 2012 2:49 PM

To: "user@cassandra.apach= e.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.o= rg<mailto:user@cassandra.apache.org>>

Subject: Primary/secondary index question / best pract= ices?

 

m my reading, it seems like I need a UUID column that will = be my primary index, and then I should set up secondary indexes on the 8-10= primary search columns.  Am I understanding this correctly?  Any= advice you can offer on this would be tremendously helpful.  I’= m quite limited in how specific I can be about the data, of course.

= --_000_333B362E7B77B344A2D0FD92840282611F7DCA67DDMSGCMSIL1003e_--