Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.214.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=fEbKWb4I2xhrzuxdDnWQQjCmmYIkC0txePivgaEJDT0DQ2FNMwoBr4NALoIYH/gQMb
         wyHKBM60TgEZJDw2z3CYHDLEWeDv6+N8sxKG1rifiWzP1mTeHFiP0HmwZ3K9vbyagao2
         c4+YLZicjfoL+XyVCa9mkV10D4BPvjhJO0cSs=
MIME-Version: 1.0
In-Reply-To: <AANLkTim1sK8Jwp14kQG56AZiw5bm3WOPnrdqqj=fhCEU@mail.gmail.com>
References: <AANLkTi=YbvJhCa8=oLRgtsS993Kuz_1crB-fZibi-cVZ@mail.gmail.com>
	<AANLkTim1sK8Jwp14kQG56AZiw5bm3WOPnrdqqj=fhCEU@mail.gmail.com>
Date: Wed, 11 Aug 2010 11:45:54 -0400
Message-ID: <AANLkTinGeaEPT2SS5VLbT0f0_VAe=Jzk11Q_Vr65Pia5@mail.gmail.com>
Subject: Re: Soliciting thoughts on possible read optimization
From: Edward Capriolo <edlinuxguru@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Aug 11, 2010 at 11:37 AM, Ryan King <ryan@twitter.com> wrote:
> On Tue, Aug 10, 2010 at 8:43 PM, Arya Asemanfar <aryaasemanfar@gmail.com> wrote:
>> I mentioned this today to a couple folks at Cassandra Summit, and thought
>> I'd solicit some more thoughts here.
>> Currently, the read stage includes checking row cache. So if your concurrent
>> reads is N and you have N reads reading from disk, the next read will block
>> until a disk read finishes, even if it's in row cache. Would it make sense
>> to isolate disk reads from cache reads? To either make the read stage be
>> only used on misses, or to make 2 read stages CacheRead and DiskRead? Of
>> course, we'd have to go to DiskRead for mmap since we wouldn't know until we
>> asked the OS.
>> My thought is that stages should be based on resources rather than
>> semantics, but that may be wrong. Logically, I don't think it would make
>> sense to have the read stage bounded in a hypothetical system where there is
>> no IO; it's most likely because of the disk and subsequent IO contention
>> that that cap was introduced.
>> As a possible bonus with this change, you can make other optimizations like
>> batching row reads from disk where the keys were in key cache (does this
>> even make sense? I'm not too sure how that would work).
>
> I think this is a reasonable analysis. The idea of stages in the
> research SEDA is to put bounds around scarce resources. I wouldn't
> call reading from the row cache a scarce resource. I'd expect this
> change to have significant performance improvements for workloads that
> are heavily rowcache-able.
>
> -ryan
>

I think that makes sense. If I understand correctly the only type of
reads that will be served purely from Row Cache would be CL.ONE, so
reads of QUORUM or ALL would skip this stage.