Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 2560 invoked from network); 12 Apr 2010 20:46:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Apr 2010 20:46:19 -0000 Received: (qmail 84512 invoked by uid 500); 12 Apr 2010 20:46:18 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 84459 invoked by uid 500); 12 Apr 2010 20:46:18 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 84451 invoked by uid 99); 12 Apr 2010 20:46:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Apr 2010 20:46:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of timelessness@gmail.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Apr 2010 20:46:10 +0000 Received: by gwb1 with SMTP id 1so1367212gwb.31 for ; Mon, 12 Apr 2010 13:45:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=nNpxIk6SCqVW4zDm3ASLN5owQk8qDipuE4mQCKyTpCo=; b=LhbFmGyAGv0NRz51LEpLH1cMuwJh0mv6ns8ZPuGVeWxz1vb7vd+S/6WPw0um4nyRqp 0+8anxHbbQNEtN7zkBAiYukrmY0UegOtHJshzkGHSsETHEH006FsUFdm948VYPFszU+/ bnhmEQxFLY6eMdADWDrzFz+h+Cs9+GBaX1SaQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=mb/iw4Pae5t/SFDP9QYbuIuKR5qWbrqZYOP2LVqnq602vKBf1pVIlfJgMavZ+yCi2k hKhmg9TMlExycF1ZwHSEqlt9bmqcgjMEyUmfln0oK9TP174kWNGmnE/WWRfv3TFi3nFD 1eZdagCTElaBswi00YRetDEOnuNF+sg/u/TII= MIME-Version: 1.0 Received: by 10.150.92.13 with HTTP; Mon, 12 Apr 2010 13:45:49 -0700 (PDT) In-Reply-To: References: <61401.54585.qm@web111713.mail.gq1.yahoo.com> Date: Mon, 12 Apr 2010 13:45:49 -0700 Received: by 10.150.210.18 with SMTP id i18mr4162525ybg.84.1271105149891; Mon, 12 Apr 2010 13:45:49 -0700 (PDT) Message-ID: Subject: Re: Worst case #iops to read a row From: Time Less To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=000e0cd378a6ee74af0484103ac3 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd378a6ee74af0484103ac3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > >> worst case is 2 or 3, depending on row size: > >> > >> one seek to read the right row index block > >> one seek to read the row header (bloom filter + column index) > >> if it's a big row, one seek to read the column block (block size is > >> configurable, default is 256KB) > > > > [This is all per-sstable that contains the row] > I'm confused. That's really worst-case? 3 iops? What if we have 10B rows in the column family? What sort of index do you us= e that would only require one iop to find the row index block? And what about multiple revisions of data, ie: if there were N updates and = M deletes on the key before a major compaction? And what about Bloom Filter false positives? What if the client asks a node that doesn't have the key? None of those cause iops? Forgive my na=EFvet=E9, but having worked with large datasets all my life, = I'm having a really hard time wrapping my head around what sort of data structures and cluster layout would allow you to retrieve data in so few iops. --=20 timeless(ness) --000e0cd378a6ee74af0484103ac3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
>> worst case is 2 or 3, depending on row size:
>>
>> one seek to read the right row index block
>> one seek to read the row header (bloom filter + column index)
>> if it's a big row, one seek to read the column block (block si= ze is
>> configurable, default is 256KB)
>
> [This is all per-sstable that contains the row]

I'm confused. That's really wors= t-case? 3 iops?

What if we have 10B rows in the column family? What = sort of index do you use that would only require one iop to find the row in= dex block?

And what about multiple revisions of data, ie: if there were N updates = and M deletes on the key before a major compaction? And what about Bloom Fi= lter false positives? What if the client asks a node that doesn't have = the key? None of those cause iops?

Forgive my na=EFvet=E9, but having worked with large datasets all my li= fe, I'm having a really hard time wrapping my head around what sort of = data structures and cluster layout would allow you to retrieve data in so f= ew iops.

--
timeless(ness)

--000e0cd378a6ee74af0484103ac3--