Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dave@meebo-inc.com designates
 74.125.150.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1297523753.19226.10.camel@dehora-laptop>
References: <1297495441759-6018234.post@n2.nabble.com>
	<1297515364.13573.21.camel@dehora-laptop>
	<4D56853B.9030004@gmail.com>
	<1297523753.19226.10.camel@dehora-laptop>
Date: Sat, 12 Feb 2011 10:38:57 -0800
Message-ID: <AANLkTimravp=GcqkZwFi1Mbc5En63icSZqjFxXnaGbnK@mail.gmail.com>
Subject: Re: Indexes and hard disk
From: Dave Revell <dave@meebo-inc.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=90e6ba4fc5089d3fe2049c1a2002

--90e6ba4fc5089d3fe2049c1a2002
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Indexes have another important advantage over multiple denormalized column
families. If you make the copies yourself, eventually the copies will
diverge from the base "true" column family due to routine occasional
failures. You'll probably want to find and fix these inconsistencies.

If you're using built-in indexes, you won't have this problem and you can
save some effort.

-Dave
On Feb 12, 2011 7:16 AM, "Bill de h=D3ra" <bill@dehora.net> wrote:
> On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen wrote:
>
>
>> Why the secondary indexes are even present in Cassandra? I thought the
>> point is that development in Cassandra is query driven, that when you
>> want to search and fetch for example by birth date you should create
>> new ColumnFamilly...
>
>
>
> Yes and no. Systems like Cassandra are designed such that you should
> write the data out as you want to read it in (because writes are cheap).
> However most systems will want to access data via a few other criteria.
> For example a blogging system that supports tags will need to list your
> blog entries by date and by tag equally efficiently . As you say, you
> can spin up a new ColumnFamilly for that, but it's such a common need
> that Cassandra 0.7 supports it directly and saves developers having to
> manage indexes by hand (under the hood, a 0.7 index is a 'private' CF).
> This for me is one of the features that really sets Cassandra apart -
> scaling and indexing data at the same time is hard, and very few systems
> do both well.
>
> Bill

--90e6ba4fc5089d3fe2049c1a2002
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>Indexes have another important advantage over multiple denormalized colu=
mn families. If you make the copies yourself, eventually the copies will di=
verge from the base &quot;true&quot; column family due to routine occasiona=
l failures. You&#39;ll probably want to find and fix these inconsistencies.=
</p>

<p>If you&#39;re using built-in indexes, you won&#39;t have this problem an=
d you can save some effort.</p>
<p>-Dave</p>
<div class=3D"gmail_quote">On Feb 12, 2011 7:16 AM, &quot;Bill de h=D3ra&qu=
ot; &lt;<a href=3D"mailto:bill@dehora.net">bill@dehora.net</a>&gt; wrote:<b=
r type=3D"attribution">&gt; On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen=
 wrote:<br>
&gt; <br>&gt; <br>&gt;&gt; Why the secondary indexes are even present in Ca=
ssandra? I thought the<br>&gt;&gt; point is that development in Cassandra i=
s query driven, that when you<br>&gt;&gt; want to search and fetch for exam=
ple by birth date you should create<br>
&gt;&gt; new ColumnFamilly...<br>&gt; <br>&gt; <br>&gt; <br>&gt; Yes and no=
. Systems like Cassandra are designed such that you should<br>&gt; write th=
e data out as you want to read it in (because writes are cheap).<br>&gt; Ho=
wever most systems will want to access data via a few other criteria.<br>
&gt; For example a blogging system that supports tags will need to list you=
r<br>&gt; blog entries by date and by tag equally efficiently . As you say,=
 you<br>&gt; can spin up a new ColumnFamilly for that, but it&#39;s such a =
common need<br>
&gt; that Cassandra 0.7 supports it directly and saves developers having to=
<br>&gt; manage indexes by hand  (under the hood, a 0.7 index is a &#39;pri=
vate&#39; CF).<br>&gt; This for me is one of the features that really sets =
Cassandra apart  -<br>
&gt; scaling and indexing data at the same time is hard, and very few syste=
ms<br>&gt; do both well.<br>&gt; <br>&gt; Bill <br></div>

--90e6ba4fc5089d3fe2049c1a2002--