Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0FF29A67 for ; Tue, 11 Dec 2012 23:26:47 +0000 (UTC) Received: (qmail 24753 invoked by uid 500); 11 Dec 2012 23:26:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24699 invoked by uid 500); 11 Dec 2012 23:26:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24690 invoked by uid 99); 11 Dec 2012 23:26:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 23:26:45 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.174.58.134] (HELO XEDGEA.nrel.gov) (192.174.58.134) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Dec 2012 23:26:41 +0000 Received: from XHUBB.nrel.gov (10.20.4.59) by XEDGEA.nrel.gov (192.174.58.134) with Microsoft SMTP Server (TLS) id 8.3.245.1; Tue, 11 Dec 2012 16:26:10 -0700 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBB.nrel.gov ([::1]) with mapi; Tue, 11 Dec 2012 16:26:19 -0700 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Tue, 11 Dec 2012 16:26:13 -0700 Subject: Re: Primary/secondary index question / best practices? Thread-Topic: Primary/secondary index question / best practices? Thread-Index: Ac3X9uxOATYrjyFpTi6pkfOtyc8GWw== Message-ID: In-Reply-To: <333B362E7B77B344A2D0FD92840282611F7DCA67DD@MSGCMSIL1003.ent.wfb.bank.corp> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.5.121010 acceptlanguage: en-US Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Is there any column that would be a good qualifer as a partition key? Some people partition by time like every month or every day, and then you c= an either have your own secondary indexes that you query into(high entropy = is NOT a big deal here) or PlayOrm can do some for you or you could use CQL= as well. Other partitioning schemes are to partition by client. The goal is to have less than probably about 5 million rows in a partition = so your wide row index is not too large. Dean From: "Stephen.M.Thompson@wellsfargo.com" > Reply-To: "user@cassandra.apache.org" > Date: Tuesday, December 11, 2012 3:45 PM To: "user@cassandra.apache.org" > Subject: RE: Primary/secondary index question / best practices? Dean, thank you for your response. To the second half of the query, I=92m = a little concerned about the secondary index approach since the indexes tha= t I want to create are columns with high entropy. For example, I would like to query by User name and IP address, values whic= h are decidedly NOT like the pattern recommended in the Secondary Index fie= ld. The 8-10 columns I need to search by are all high a similar scatter r= ate. Since the documentation seems to suggest that this is a bad idea, wha= t would the correct pattern look like? In an RDBMS I would just slap an alternate key index on the table and let i= t roll. It seems like maybe that is not the right approach for Cassandra? Thanks again, Steve -----Original Message----- From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov] Sent: Tuesday, December 11, 2012 4:57 PM To: user@cassandra.apache.org Subject: Re: Primary/secondary index question / best practices? Hard to help out on a design without specifics but here is some advice base= d on the limited information Primary key : yes, must be cluster unique. TimeUUID or UUID=85.PlayOrm has= very unique TimeUUID like keys as in this one 7AL2S8Y.b1 (b1 is the hostna= me and the prefix is a "unique" timestamp but generated to a shorter string= (ah, nice readable primary keys). There are some patterns you can look into here that may help https://github= .com/deanhiller/playorm/wiki/Patterns-Page If you can partition your data virtually, it may help a lot so you can quer= y into the partitions. Later, Dean From: "Stephen.M.Thompson@wellsfargo.com" >> Reply-To: "user@cassandra.apache.org" >> Date: Tuesday, December 11, 2012 2:49 PM To: "user@cassandra.apache.org" >> Subject: Primary/secondary index question / best practices? m my reading, it seems like I need a UUID column that will be my primary in= dex, and then I should set up secondary indexes on the 8-10 primary search = columns. Am I understanding this correctly? Any advice you can offer on t= his would be tremendously helpful. I=92m quite limited in how specific I c= an be about the data, of course.