Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5BB27E279 for ; Fri, 22 Feb 2013 16:37:52 +0000 (UTC) Received: (qmail 32861 invoked by uid 500); 22 Feb 2013 16:37:47 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 32645 invoked by uid 500); 22 Feb 2013 16:37:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 32627 invoked by uid 99); 22 Feb 2013 16:37:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 16:37:46 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a92.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 16:37:39 +0000 Received: from homiemail-a92.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a92.g.dreamhost.com (Postfix) with ESMTP id CA83E3DC06E for ; Fri, 22 Feb 2013 08:37:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=44WVn8tiGA8Cg3Ww0zNIlK50HP w=; b=ZhvplsX9dmuZTGFiO5eRXyACfcaHPI7FkmpWtDJVMDIO9sTVFuEiKqvo3Y Eb+L/xp6Hzf+7qVbhy/LxKl0clvLoAx6gU0r8JVZepLVuLLOlsGMuAG0xEysg4wH C8YAggzyVOfvWiVQ2g2ymUqJI+kDRJjOazUDvUNM4HaElJmaU= Received: from [172.16.1.8] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a92.g.dreamhost.com (Postfix) with ESMTPSA id 23DBD3DC05B for ; Fri, 22 Feb 2013 08:37:16 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Read IO Date: Sat, 23 Feb 2013 05:37:17 +1300 References: <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB32F50@mbx024-e1-nj-6.exch024.domain.local> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 AFAIk this is still roughly correct = http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ It includes information on the page size read from disk.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 5:45 AM, Jouni Hartikainen = wrote: >=20 > Hi, >=20 > On Feb 21, 2013, at 7:52 , Kanwar Sangha wrote: >> Hi =96 Can someone explain the worst case IOPS for a read ? No key = cache, No row cache, sampling rate say 512. >>=20 >> 1) Bloom filter will be checked to see existence of key (In RAM) >> 2) Index filer sample (IN RAM) will be checked to find approx. = location in index file on disk >> 3) 1 IOPS to read the actual index file on disk (DISK) >> 4) 1 IOPS to get the data from the location in the sstable = (DISK) >>=20 >> Is this correct ? >=20 > As you were asking for the worst case, I would still add one step that = would be a seek inside an SSTable from the row start to the queried = columns using column index. >=20 > However, this applies only if you are querying a subset of columns in = the row (not all) and the total row size exceeds column_index_size_in_kb = (defaults to 64kB). >=20 > So, as far as I have understood, the worst case steps (without any = caches) are: >=20 > 1. Check the SSTable bloom filters (in memory) > 2. Use index samples to find approx. correct place in the key index = file (in memory) > 3. Read the key index file until correct key is found (1st disk seek & = read) > 5. Seek to the start of the row in SSTable file and read row headers = (possibly including column index) (2nd seek & read) > 6. Using column index seek to the correct place inside the SSTable = file to actually read the columns (3rd seek & read) >=20 > If the row is very wide and you are asking for a random bunch of = columns from here and there, the step 6 might even be needed multiple = times. Also, if your row has spread over many SSTables, each of them = needs to be accessed (at least steps 1. - 5.) to get the complete = results for the query. >=20 > All this in mind, if your node has any reasonable amount of reads, I'd = say that in practice key index files will be page cached by the OS very = quickly and thus normal read would end up being either one seek (for = small rows without the column index) or two (for wider rows). Of course, = as Peter already pointed out, the more columns you ask for, the more = disk needs to read. For a continuous set of columns the read should be = linear, however. >=20 > -Jouni --Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252 AFAIk = this is still roughly correct htt= p://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

It includes information on the page size read from = disk. 

Cheers

http://www.thelastpickle.com

On 22/02/2013, at 5:45 AM, Jouni Hartikainen <jouni.hartikainen@reaktor.fi<= /a>> wrote:


Hi,

On Feb 21, 2013, at 7:52 , Kanwar Sangha = <
kanwar@mavenir.com> = wrote:
Hi =96 Can someone explain the worst = case IOPS for a read ? No key cache, No row cache, sampling rate say = 512.

1)      Bloom filter will be = checked to see existence of key (In RAM)
2) =      Index filer sample (IN RAM) will be = checked to find approx. location in index file on disk
3) =      1 IOPS to read the actual index file on = disk (DISK)
4)      1 IOPS to get the data = from the location in the sstable (DISK)

Is this correct = ?

As you were asking for the worst case, I would = still add one step that would be a seek inside an SSTable from the row = start to the queried columns using column index.

However, this = applies only if you are querying a subset of columns in the row (not = all) and the total row size exceeds column_index_size_in_kb (defaults to = 64kB).

So, as far as I have understood, the worst case steps = (without any caches) are:

1. Check the SSTable bloom filters (in = memory)
2. Use index samples to find approx. correct place in the key = index file (in memory)
3. Read the key index file until correct key = is found (1st disk seek & read)
5. Seek to the start of the row = in SSTable file and read row headers (possibly including column index) = (2nd seek & read)
6. Using column index seek to the correct place = inside the SSTable file to actually read the columns (3rd seek & = read)

If the row is very wide and you are asking for a random = bunch of columns from here and there, the step 6 might even be needed = multiple times. Also, if your row has spread over many SSTables, each of = them needs to be accessed (at least steps 1. - 5.) to get the complete = results for the query.

All this in mind, if your node has any = reasonable amount of reads, I'd say that in practice key index files = will be page cached by the OS very quickly and thus normal read would = end up being either one seek (for small rows without the column index) = or two (for wider rows). Of course, as Peter already pointed out, the = more columns you ask for, the more disk needs to read. For a continuous = set of columns the read should be linear, = however.

-Jouni

= --Apple-Mail=_EB8F3D8E-8747-4B39-9132-77F8FA8619A6--