Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6"
Message-Id: <FA1BD176-C5A3-4C87-8DB0-5A83A93BDABD@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Read IO
Date: Sat, 23 Feb 2013 05:37:17 +1300
References: 
 <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB32F50@mbx024-e1-nj-6.exch024.domain.local>
 <FAA9FDB0-5C9D-4B33-9181-00BE38819A32@reaktor.fi>
To: user@cassandra.apache.org
In-Reply-To: <FAA9FDB0-5C9D-4B33-9181-00BE38819A32@reaktor.fi>


--Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

AFAIk this is still roughly correct =
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

It includes information on the page size read from disk.=20

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 5:45 AM, Jouni Hartikainen =
<jouni.hartikainen@reaktor.fi> wrote:

>=20
> Hi,
>=20
> On Feb 21, 2013, at 7:52 , Kanwar Sangha <kanwar@mavenir.com> wrote:
>> Hi =96 Can someone explain the worst case IOPS for a read ? No key =
cache, No row cache, sampling rate say 512.
>>=20
>> 1)      Bloom filter will be checked to see existence of key (In RAM)
>> 2)      Index filer sample (IN RAM) will be checked to find approx. =
location in index file on disk
>> 3)      1 IOPS to read the actual index file on disk (DISK)
>> 4)      1 IOPS to get the data from the location in the sstable =
(DISK)
>>=20
>> Is this correct ?
>=20
> As you were asking for the worst case, I would still add one step that =
would be a seek inside an SSTable from the row start to the queried =
columns using column index.
>=20
> However, this applies only if you are querying a subset of columns in =
the row (not all) and the total row size exceeds column_index_size_in_kb =
(defaults to 64kB).
>=20
> So, as far as I have understood, the worst case steps (without any =
caches) are:
>=20
> 1. Check the SSTable bloom filters (in memory)
> 2. Use index samples to find approx. correct place in the key index =
file (in memory)
> 3. Read the key index file until correct key is found (1st disk seek & =
read)
> 5. Seek to the start of the row in SSTable file and read row headers =
(possibly including column index) (2nd seek & read)
> 6. Using column index seek to the correct place inside the SSTable =
file to actually read the columns (3rd seek & read)
>=20
> If the row is very wide and you are asking for a random bunch of =
columns from here and there, the step 6 might even be needed multiple =
times. Also, if your row has spread over many SSTables, each of them =
needs to be accessed (at least steps 1. - 5.) to get the complete =
results for the query.
>=20
> All this in mind, if your node has any reasonable amount of reads, I'd =
say that in practice key index files will be page cached by the OS very =
quickly and thus normal read would end up being either one seek (for =
small rows without the column index) or two (for wider rows). Of course, =
as Peter already pointed out, the more columns you ask for, the more =
disk needs to read. For a continuous set of columns the read should be =
linear, however.
>=20
> -Jouni


--Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">AFAIk =
this is still roughly correct&nbsp;<a =
href=3D"http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/">htt=
p://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/</a><div><br></d=
iv><div>It includes information on the page size read from =
disk.&nbsp;</div><div><br></div><div>Cheers</div><div><br><div =
apple-content-edited=3D"true">
<div style=3D"color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
medium; font-style: normal; font-variant: normal; font-weight: normal; =
letter-spacing: normal; line-height: normal; orphans: 2; text-align: =
-webkit-auto; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
border-spacing: 0px; -webkit-text-decorations-in-effect: none; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
font-size: medium; "><div style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; border-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Cassandra Developer</div><div>New =
Zealand</div><div><br></div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></span></div></span></div></span></div></span></div>
</div>

<br><div><div>On 22/02/2013, at 5:45 AM, Jouni Hartikainen &lt;<a =
href=3D"mailto:jouni.hartikainen@reaktor.fi">jouni.hartikainen@reaktor.fi<=
/a>&gt; wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><br>Hi,<br><br>On Feb 21, 2013, at 7:52 , Kanwar Sangha =
&lt;<a href=3D"mailto:kanwar@mavenir.com">kanwar@mavenir.com</a>&gt; =
wrote:<br><blockquote type=3D"cite">Hi =96 Can someone explain the worst =
case IOPS for a read ? No key cache, No row cache, sampling rate say =
512.<br><br>1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Bloom filter will be =
checked to see existence of key (In RAM)<br>2) =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Index filer sample (IN RAM) will be =
checked to find approx. location in index file on disk<br>3) =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1 IOPS to read the actual index file on =
disk (DISK)<br>4) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1 IOPS to get the data =
from the location in the sstable (DISK)<br><br>Is this correct =
?<br></blockquote><br>As you were asking for the worst case, I would =
still add one step that would be a seek inside an SSTable from the row =
start to the queried columns using column index.<br><br>However, this =
applies only if you are querying a subset of columns in the row (not =
all) and the total row size exceeds column_index_size_in_kb (defaults to =
64kB).<br><br>So, as far as I have understood, the worst case steps =
(without any caches) are:<br><br>1. Check the SSTable bloom filters (in =
memory)<br>2. Use index samples to find approx. correct place in the key =
index file (in memory)<br>3. Read the key index file until correct key =
is found (1st disk seek &amp; read)<br>5. Seek to the start of the row =
in SSTable file and read row headers (possibly including column index) =
(2nd seek &amp; read)<br>6. Using column index seek to the correct place =
inside the SSTable file to actually read the columns (3rd seek &amp; =
read)<br><br>If the row is very wide and you are asking for a random =
bunch of columns from here and there, the step 6 might even be needed =
multiple times. Also, if your row has spread over many SSTables, each of =
them needs to be accessed (at least steps 1. - 5.) to get the complete =
results for the query.<br><br>All this in mind, if your node has any =
reasonable amount of reads, I'd say that in practice key index files =
will be page cached by the OS very quickly and thus normal read would =
end up being either one seek (for small rows without the column index) =
or two (for wider rows). Of course, as Peter already pointed out, the =
more columns you ask for, the more disk needs to read. For a continuous =
set of columns the read should be linear, =
however.<br><br>-Jouni</blockquote></div><br></div></body></html>=

--Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6--