From user-return-15564-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Apr 08 20:48:17 2011 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 15139 invoked from network); 8 Apr 2011 20:48:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2011 20:48:16 -0000 Received: (qmail 64420 invoked by uid 500); 8 Apr 2011 20:48:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 64397 invoked by uid 500); 8 Apr 2011 20:48:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 64389 invoked by uid 99); 8 Apr 2011 20:48:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 20:48:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adi.pandit@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Apr 2011 20:48:09 +0000 Received: by iwn39 with SMTP id 39so4688522iwn.31 for ; Fri, 08 Apr 2011 13:47:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=qlZEodFbL0kFTsor7uktzT5MpjLDLQZ7ga6VTL15ZlY=; b=IfdDkQ8vbAOXljben5j0TFAg4283HqmK86ZC9I5SmI2NBuyo70NMVtnc819Mmt4a+2 v0+uYeZaeGvTrM2xint0xbd0P4OvJJzvdvv07dZMAEObYfaLhlbMGRseka5Wv4ES/RaO GHbgsJc35hfzHAbN1gUBqGDpaAHdXRRPVclpQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=LyfghWOv1K7PplYDqYMHL40oFfic8v+UVLc+8Q+U2iDwZ4vmwZNz+zemu9L8S8IIZY UEwB2NVmrsgRNyrTGVURkTKAOr/iqNOhfbIdglkOTozNwos8xpy2dYYDgeGwiAuYOBjg Z4F+94FEKtNRBIM57RxYH5Bl8p0L1qkp73c34= MIME-Version: 1.0 Received: by 10.231.117.157 with SMTP id r29mr2566615ibq.128.1302295668966; Fri, 08 Apr 2011 13:47:48 -0700 (PDT) Received: by 10.42.221.6 with HTTP; Fri, 8 Apr 2011 13:47:48 -0700 (PDT) In-Reply-To: References: Date: Fri, 8 Apr 2011 16:47:48 -0400 Message-ID: Subject: Re: ballpark low cardinality range for secondary indexes From: Adi To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00163692026fbdc73804a06e56af --00163692026fbdc73804a06e56af Content-Type: text/plain; charset=ISO-8859-1 Thanks for the suggestions Ed. Your blog post is quite helpful in deciding on and implementing CF inverted indexes. Our data definitely leans towards external CF - has high cardinality(1000s for one column, millions for another), multiple columns need to be indexed, needs sorted order. Hope that amazon paper has some good tips on solving the transactional gotcha :-) -Adi On Fri, Apr 8, 2011 at 3:49 PM, Ed Anuff wrote: > If you're just indexing on a single column value and the values have > low cardinality in, say, the 10's - I'd have a wide row for each > cardinal value that contained the set of keys for rows that contained > that value. For higher levels of cardinality or if you're indexing on > multiple columns, there are tradeoffs for secondary indexes versus CF > inverted indexes that are based on atomicity of updates, complexity of > queries, and whether you need to get results in sorted order. > Secondary indexes are usually the best starting point since they're > easy to set up and use, versus CF inverted indexes, where you'll need > to manage all that yourself. Some of the client libraries make it > easier to build CF inverted indexes, Hector is going to soon have some > capabilities for JPA users leveraging the new composite column types > to do this. I wrote up a blog post a while back talking about > indexing approaches at > http://www.anuff.com/2011/02/indexing-in-cassandra.html that you might > find useful, although it sounds like you're already familiar with the > concepts > > Ed > > On Fri, Apr 8, 2011 at 7:53 AM, Adi wrote: > > I am trying to decide whether to use secondary indexes or use an inverted > > index column family for a use case. Is there any suggested ballpark range > > for low cardinality for which secondary indexes are suitable. > > Meaning at what range should using a secondary index be ruled in or out: > > cardinality of tens, hundreds, thousands,millions? > > I am not looking for any tested numbers a general suggestion/best > practice > > recommendation will suffice. > > > > Thanks. > > > > -Adi > > > > > --00163692026fbdc73804a06e56af Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks for the suggestions Ed.=A0 Your blog post is quite helpful in decidi= ng on and implementing CF inverted indexes.
Our data definitely leans to= wards external CF - has high cardinality(1000s for one column, millions for= another), multiple columns need to be indexed, needs sorted order.
Hope that amazon paper has some good tips on solving the transactional gotc= ha :-)

-Adi

On Fri, Apr 8, 2011 at= 3:49 PM, Ed Anuff <ed= @anuff.com> wrote:
If you're just indexing on a single col= umn value and the values have
low cardinality in, say, the 10's - I'd have a wide row for each cardinal value that contained the set of keys for rows that contained
that value. =A0For higher levels of cardinality or if you're indexing o= n
multiple columns, there are tradeoffs for secondary indexes versus CF
inverted indexes that are based on atomicity of updates, complexity of
queries, and whether you need to get results in sorted order.
Secondary indexes are usually the best starting point since they're
easy to set up and use, versus CF inverted indexes, where you'll need to manage all that yourself. =A0Some of the client libraries make it
easier to build CF inverted indexes, Hector is going to soon have some
capabilities for JPA users leveraging the new composite column types
to do this. =A0I wrote up a blog post a while back talking about
indexing approaches at
http://www.anuff.com/2011/02/indexing-in-cassandra.html tha= t you might
find useful, although it sounds like you're already familiar with the concepts

Ed

On Fri, Apr 8, 2011 at 7:53 AM, Adi <adi.pandit@gmail.com> wrote:
> I am trying to decide whether to use secondary indexes or use an inver= ted
> index column family for a use case. Is there any suggested ballpark ra= nge
> for low cardinality for which secondary indexes are suitable.
> Meaning at what range should=A0 using a secondary index be ruled in or= out:
> cardinality of tens, hundreds, thousands,millions?
> I am not looking for any tested numbers a general suggestion/best prac= tice
> recommendation will suffice.
>
> Thanks.
>
> -Adi
>
>

--00163692026fbdc73804a06e56af--