Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of timelessness@gmail.com
 designates 74.125.83.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=mb/iw4Pae5t/SFDP9QYbuIuKR5qWbrqZYOP2LVqnq602vKBf1pVIlfJgMavZ+yCi2k
         hKhmg9TMlExycF1ZwHSEqlt9bmqcgjMEyUmfln0oK9TP174kWNGmnE/WWRfv3TFi3nFD
         1eZdagCTElaBswi00YRetDEOnuNF+sg/u/TII=
MIME-Version: 1.0
In-Reply-To: <t2ke06563881004090928k8ea950d1t9d85933cf4b74b50@mail.gmail.com>
References: <61401.54585.qm@web111713.mail.gq1.yahoo.com>
	 <w2ze06563881004090839q7027f8f7hd858018046300029@mail.gmail.com>
	 <x2g7c5131fa1004090923v76d726fdnf105d6e5d4d3c91b@mail.gmail.com>
	 <t2ke06563881004090928k8ea950d1t9d85933cf4b74b50@mail.gmail.com>
Date: Mon, 12 Apr 2010 13:45:49 -0700
Message-ID: <l2q8ddbf2ee1004121345s31f6510et2923757962c01932@mail.gmail.com>
Subject: Re: Worst case #iops to read a row
From: Time Less <timelessness@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd378a6ee74af0484103ac3

--000e0cd378a6ee74af0484103ac3
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

> >> worst case is 2 or 3, depending on row size:
> >>
> >> one seek to read the right row index block
> >> one seek to read the row header (bloom filter + column index)
> >> if it's a big row, one seek to read the column block (block size is
> >> configurable, default is 256KB)
> >
> > [This is all per-sstable that contains the row]
>

I'm confused. That's really worst-case? 3 iops?

What if we have 10B rows in the column family? What sort of index do you us=
e
that would only require one iop to find the row index block?

And what about multiple revisions of data, ie: if there were N updates and =
M
deletes on the key before a major compaction? And what about Bloom Filter
false positives? What if the client asks a node that doesn't have the key?
None of those cause iops?

Forgive my na=EFvet=E9, but having worked with large datasets all my life, =
I'm
having a really hard time wrapping my head around what sort of data
structures and cluster layout would allow you to retrieve data in so few
iops.

--=20
timeless(ness)

--000e0cd378a6ee74af0484103ac3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"m=
argin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); paddin=
g-left: 1ex;"><div><div class=3D"h5">
&gt;&gt; worst case is 2 or 3, depending on row size:<br>
&gt;&gt;<br>
&gt;&gt; one seek to read the right row index block<br>
&gt;&gt; one seek to read the row header (bloom filter + column index)<br>
&gt;&gt; if it&#39;s a big row, one seek to read the column block (block si=
ze is<br>
&gt;&gt; configurable, default is 256KB)<br>
&gt;<br>
&gt; [This is all per-sstable that contains the row]<br>
</div></div></blockquote></div><br>I&#39;m confused. That&#39;s really wors=
t-case? 3 iops?<br><br>What if we have 10B rows in the column family? What =
sort of index do you use that would only require one iop to find the row in=
dex block?<br>
<br>And what about multiple revisions of data, ie: if there were N updates =
and M deletes on the key before a major compaction? And what about Bloom Fi=
lter false positives? What if the client asks a node that doesn&#39;t have =
the key? None of those cause iops?<br>
<br>Forgive my na=EFvet=E9, but having worked with large datasets all my li=
fe, I&#39;m having a really hard time wrapping my head around what sort of =
data structures and cluster layout would allow you to retrieve data in so f=
ew iops.<br>
<br>-- <br>timeless(ness)<br><br>

--000e0cd378a6ee74af0484103ac3--