Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 31662 invoked from network); 12 Feb 2011 18:39:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2011 18:39:28 -0000 Received: (qmail 54020 invoked by uid 500); 12 Feb 2011 18:39:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53977 invoked by uid 500); 12 Feb 2011 18:39:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53969 invoked by uid 99); 12 Feb 2011 18:39:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 18:39:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dave@meebo-inc.com designates 74.125.150.50 as permitted sender) Received: from [74.125.150.50] (HELO na6sys009bog005.obsmtp.com) (74.125.150.50) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Feb 2011 18:39:18 +0000 Received: from source ([209.85.220.174]) (using TLSv1) by na6sys009bob005.postini.com ([74.125.148.12]) with SMTP ID DSNKTVbTwRLXZ2NKQsbhnV/juMwCeQwqgQLr@postini.com; Sat, 12 Feb 2011 10:38:58 PST Received: by vxb37 with SMTP id 37so2097240vxb.5 for ; Sat, 12 Feb 2011 10:38:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.220.181.12 with SMTP id bw12mr2483080vcb.266.1297535937101; Sat, 12 Feb 2011 10:38:57 -0800 (PST) Received: by 10.220.64.201 with HTTP; Sat, 12 Feb 2011 10:38:57 -0800 (PST) Received: by 10.220.64.201 with HTTP; Sat, 12 Feb 2011 10:38:57 -0800 (PST) In-Reply-To: <1297523753.19226.10.camel@dehora-laptop> References: <1297495441759-6018234.post@n2.nabble.com> <1297515364.13573.21.camel@dehora-laptop> <4D56853B.9030004@gmail.com> <1297523753.19226.10.camel@dehora-laptop> Date: Sat, 12 Feb 2011 10:38:57 -0800 Message-ID: Subject: Re: Indexes and hard disk From: Dave Revell To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba4fc5089d3fe2049c1a2002 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba4fc5089d3fe2049c1a2002 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Indexes have another important advantage over multiple denormalized column families. If you make the copies yourself, eventually the copies will diverge from the base "true" column family due to routine occasional failures. You'll probably want to find and fix these inconsistencies. If you're using built-in indexes, you won't have this problem and you can save some effort. -Dave On Feb 12, 2011 7:16 AM, "Bill de h=D3ra" wrote: > On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen wrote: > > >> Why the secondary indexes are even present in Cassandra? I thought the >> point is that development in Cassandra is query driven, that when you >> want to search and fetch for example by birth date you should create >> new ColumnFamilly... > > > > Yes and no. Systems like Cassandra are designed such that you should > write the data out as you want to read it in (because writes are cheap). > However most systems will want to access data via a few other criteria. > For example a blogging system that supports tags will need to list your > blog entries by date and by tag equally efficiently . As you say, you > can spin up a new ColumnFamilly for that, but it's such a common need > that Cassandra 0.7 supports it directly and saves developers having to > manage indexes by hand (under the hood, a 0.7 index is a 'private' CF). > This for me is one of the features that really sets Cassandra apart - > scaling and indexing data at the same time is hard, and very few systems > do both well. > > Bill --90e6ba4fc5089d3fe2049c1a2002 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Indexes have another important advantage over multiple denormalized colu= mn families. If you make the copies yourself, eventually the copies will di= verge from the base "true" column family due to routine occasiona= l failures. You'll probably want to find and fix these inconsistencies.=

If you're using built-in indexes, you won't have this problem an= d you can save some effort.

-Dave

On Feb 12, 2011 7:16 AM, "Bill de h=D3ra&qu= ot; <bill@dehora.net> wrote:> On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen= wrote:
>
>
>> Why the secondary indexes are even present in Ca= ssandra? I thought the
>> point is that development in Cassandra i= s query driven, that when you
>> want to search and fetch for exam= ple by birth date you should create
>> new ColumnFamilly...
>
>
>
> Yes and no= . Systems like Cassandra are designed such that you should
> write th= e data out as you want to read it in (because writes are cheap).
> Ho= wever most systems will want to access data via a few other criteria.
> For example a blogging system that supports tags will need to list you= r
> blog entries by date and by tag equally efficiently . As you say,= you
> can spin up a new ColumnFamilly for that, but it's such a = common need
> that Cassandra 0.7 supports it directly and saves developers having to=
> manage indexes by hand (under the hood, a 0.7 index is a 'pri= vate' CF).
> This for me is one of the features that really sets = Cassandra apart -
> scaling and indexing data at the same time is hard, and very few syste= ms
> do both well.
>
> Bill
--90e6ba4fc5089d3fe2049c1a2002--