Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 4486 invoked from network); 11 Aug 2010 16:48:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Aug 2010 16:48:23 -0000 Received: (qmail 30740 invoked by uid 500); 11 Aug 2010 15:46:22 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 30693 invoked by uid 500); 11 Aug 2010 15:46:22 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 30685 invoked by uid 99); 11 Aug 2010 15:46:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 15:46:21 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 15:46:14 +0000 Received: by bwz10 with SMTP id 10so239140bwz.31 for ; Wed, 11 Aug 2010 08:45:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=MIfDB1385Iamat2axq+cq0yg3y8pPguFHGDgLgZtF1I=; b=XEWYm2ZfuPu+DENndIv99DBvl4/rgzEwRt1qh/Ql++2fwGxQpS73gfiNZqSefVJcQe 81YR+m2MOUXBYGozJtHGN1A4BtJBBggNdiBfp4syDSUw67xD87LC3LNWskuM6z+bku4W SZqPHbpxAFXYfZxRePBSliRsc8vWwenYnaXSc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=fEbKWb4I2xhrzuxdDnWQQjCmmYIkC0txePivgaEJDT0DQ2FNMwoBr4NALoIYH/gQMb wyHKBM60TgEZJDw2z3CYHDLEWeDv6+N8sxKG1rifiWzP1mTeHFiP0HmwZ3K9vbyagao2 c4+YLZicjfoL+XyVCa9mkV10D4BPvjhJO0cSs= MIME-Version: 1.0 Received: by 10.204.126.82 with SMTP id b18mr5388785bks.124.1281541554308; Wed, 11 Aug 2010 08:45:54 -0700 (PDT) Received: by 10.204.62.84 with HTTP; Wed, 11 Aug 2010 08:45:54 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 Aug 2010 11:45:54 -0400 Message-ID: Subject: Re: Soliciting thoughts on possible read optimization From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Aug 11, 2010 at 11:37 AM, Ryan King wrote: > On Tue, Aug 10, 2010 at 8:43 PM, Arya Asemanfar wrote: >> I mentioned this today to a couple folks at Cassandra Summit, and thought >> I'd solicit some more thoughts here. >> Currently, the read stage includes checking row cache. So if your concurrent >> reads is N and you have N reads reading from disk, the next read will block >> until a disk read finishes, even if it's in row cache. Would it make sense >> to isolate disk reads from cache reads? To either make the read stage be >> only used on misses, or to make 2 read stages CacheRead and DiskRead? Of >> course, we'd have to go to DiskRead for mmap since we wouldn't know until we >> asked the OS. >> My thought is that stages should be based on resources rather than >> semantics, but that may be wrong. Logically, I don't think it would make >> sense to have the read stage bounded in a hypothetical system where there is >> no IO; it's most likely because of the disk and subsequent IO contention >> that that cap was introduced. >> As a possible bonus with this change, you can make other optimizations like >> batching row reads from disk where the keys were in key cache (does this >> even make sense? I'm not too sure how that would work). > > I think this is a reasonable analysis. The idea of stages in the > research SEDA is to put bounds around scarce resources. I wouldn't > call reading from the row cache a scarce resource. I'd expect this > change to have significant performance improvements for workloads that > are heavily rowcache-able. > > -ryan > I think that makes sense. If I understand correctly the only type of reads that will be served purely from Row Cache would be CL.ONE, so reads of QUORUM or ALL would skip this stage.