Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 57202 invoked from network); 8 Apr 2011 21:03:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 21:03:35 -0000 Received: (qmail 93473 invoked by uid 500); 8 Apr 2011 21:03:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 93438 invoked by uid 500); 8 Apr 2011 21:03:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 93430 invoked by uid 99); 8 Apr 2011 21:03:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 21:03:34 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 21:03:28 +0000 Received: by wyb29 with SMTP id 29so3770176wyb.31 for ; Fri, 08 Apr 2011 14:03:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.199.68 with SMTP id er4mr2686695wbb.47.1302296587024; Fri, 08 Apr 2011 14:03:07 -0700 (PDT) Received: by 10.227.195.137 with HTTP; Fri, 8 Apr 2011 14:03:07 -0700 (PDT) In-Reply-To: References: Date: Fri, 8 Apr 2011 14:03:07 -0700 Message-ID: Subject: Re: ballpark low cardinality range for secondary indexes From: Ed Anuff To: user@cassandra.apache.org Cc: Adi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Well, the amazon paper is good at describing the nature of the problem, but to solve it you'll probably want to use zookeeper. The paper is useful in understanding exactly what you need to lock on and what you don't while updating the index, so you can avoid slowing things down any more than is necessary. Ed On Fri, Apr 8, 2011 at 1:47 PM, Adi wrote: > Thanks for the suggestions Ed.=A0 Your blog post is quite helpful in deci= ding > on and implementing CF inverted indexes. > Our data definitely leans towards external CF - has high cardinality(1000= s > for one column, millions for another), multiple columns need to be indexe= d, > needs sorted order. > Hope that amazon paper has some good tips on solving the transactional > gotcha :-) > > -Adi > > On Fri, Apr 8, 2011 at 3:49 PM, Ed Anuff wrote: >> >> If you're just indexing on a single column value and the values have >> low cardinality in, say, the 10's - I'd have a wide row for each >> cardinal value that contained the set of keys for rows that contained >> that value. =A0For higher levels of cardinality or if you're indexing on >> multiple columns, there are tradeoffs for secondary indexes versus CF >> inverted indexes that are based on atomicity of updates, complexity of >> queries, and whether you need to get results in sorted order. >> Secondary indexes are usually the best starting point since they're >> easy to set up and use, versus CF inverted indexes, where you'll need >> to manage all that yourself. =A0Some of the client libraries make it >> easier to build CF inverted indexes, Hector is going to soon have some >> capabilities for JPA users leveraging the new composite column types >> to do this. =A0I wrote up a blog post a while back talking about >> indexing approaches at >> http://www.anuff.com/2011/02/indexing-in-cassandra.html that you might >> find useful, although it sounds like you're already familiar with the >> concepts >> >> Ed >> >> On Fri, Apr 8, 2011 at 7:53 AM, Adi wrote: >> > I am trying to decide whether to use secondary indexes or use an >> > inverted >> > index column family for a use case. Is there any suggested ballpark >> > range >> > for low cardinality for which secondary indexes are suitable. >> > Meaning at what range should=A0 using a secondary index be ruled in or >> > out: >> > cardinality of tens, hundreds, thousands,millions? >> > I am not looking for any tested numbers a general suggestion/best >> > practice >> > recommendation will suffice. >> > >> > Thanks. >> > >> > -Adi >> > >> > > >