Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 34696 invoked from network); 8 Apr 2011 19:50:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 19:50:13 -0000 Received: (qmail 80072 invoked by uid 500); 8 Apr 2011 19:50:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80026 invoked by uid 500); 8 Apr 2011 19:50:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 79992 invoked by uid 99); 8 Apr 2011 19:50:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 19:50:09 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.44] (HELO mail-ww0-f44.google.com) (74.125.82.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 19:50:02 +0000 Received: by wwa36 with SMTP id 36so4124760wwa.25 for ; Fri, 08 Apr 2011 12:49:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.197.83 with SMTP id ej19mr2580303wbb.105.1302292181687; Fri, 08 Apr 2011 12:49:41 -0700 (PDT) Received: by 10.227.195.137 with HTTP; Fri, 8 Apr 2011 12:49:41 -0700 (PDT) In-Reply-To: References: Date: Fri, 8 Apr 2011 12:49:41 -0700 Message-ID: Subject: Re: ballpark low cardinality range for secondary indexes From: Ed Anuff To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org If you're just indexing on a single column value and the values have low cardinality in, say, the 10's - I'd have a wide row for each cardinal value that contained the set of keys for rows that contained that value. For higher levels of cardinality or if you're indexing on multiple columns, there are tradeoffs for secondary indexes versus CF inverted indexes that are based on atomicity of updates, complexity of queries, and whether you need to get results in sorted order. Secondary indexes are usually the best starting point since they're easy to set up and use, versus CF inverted indexes, where you'll need to manage all that yourself. Some of the client libraries make it easier to build CF inverted indexes, Hector is going to soon have some capabilities for JPA users leveraging the new composite column types to do this. I wrote up a blog post a while back talking about indexing approaches at http://www.anuff.com/2011/02/indexing-in-cassandra.html that you might find useful, although it sounds like you're already familiar with the concepts Ed On Fri, Apr 8, 2011 at 7:53 AM, Adi wrote: > I am trying to decide whether to use secondary indexes or use an inverted > index column family for a use case. Is there any suggested ballpark range > for low cardinality for which secondary indexes are suitable. > Meaning at what range should=A0 using a secondary index be ruled in or ou= t: > cardinality of tens, hundreds, thousands,millions? > I am not looking for any tested numbers a general suggestion/best practic= e > recommendation will suffice. > > Thanks. > > -Adi > >