From user-return-13685-apmail-couchdb-user-archive=couchdb.apache.org@couchdb.apache.org Sat Nov 13 18:45:06 2010 Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 66190 invoked from network); 13 Nov 2010 18:45:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Nov 2010 18:45:05 -0000 Received: (qmail 14909 invoked by uid 500); 13 Nov 2010 18:45:35 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 14826 invoked by uid 500); 13 Nov 2010 18:45:35 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 14818 invoked by uid 99); 13 Nov 2010 18:45:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Nov 2010 18:45:35 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robert.newson@gmail.com designates 209.85.216.45 as permitted sender) Received: from [209.85.216.45] (HELO mail-qw0-f45.google.com) (209.85.216.45) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Nov 2010 18:45:29 +0000 Received: by qwi4 with SMTP id 4so930998qwi.32 for ; Sat, 13 Nov 2010 10:45:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:from:in-reply-to:mime-version :references:date:message-id:subject:to:content-type :content-transfer-encoding; bh=7JZck/Yrq9Nkfgw7bevhex9ahXb4hkwG7rPn38U9DsA=; b=mrEvgyFBaA6d3CY56oeSQFWbx6jCLJKouYg70m3wDKx4mZJa7r+3gxp/dtAg6otUCu rOxjsfKekTwJb10uWmyLQJRu9mT0c37mYLIZMcUnmGSmMZC2jJjYZaJJOXVtpJVVIvOV //dcWlffz3bLnqMW5qndL9nlowSNQ4ZOAg2Tc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:in-reply-to:mime-version:references:date:message-id:subject:to :content-type:content-transfer-encoding; b=d/jtbTL646Tjh2K8uWwWdNWcGlsiQAe0NFnwyTqVtI5VKX8uxyb/b8xAiRUSqnxtDC J16n2zXVChqRd8VYvss83drbjya1kACywTAUoYXT9qjyu/9Z1rOVHfeIPcJogZsgDRa7 kOCoFbHhLZpxSWYd6O0VbQzwMsLHRbKaohaSI= Received: by 10.229.224.212 with SMTP id ip20mr3264500qcb.278.1289673908451; Sat, 13 Nov 2010 10:45:08 -0800 (PST) From: Robert Newson In-Reply-To: <4085F3E3-7F44-4C45-B607-8C74CC4E4650@gmail.com> Mime-Version: 1.0 (iPad Mail 7B500) References: <4085F3E3-7F44-4C45-B607-8C74CC4E4650@gmail.com> Date: Sat, 13 Nov 2010 18:45:46 +0000 Message-ID: <-143799235427837760@unknownmsgid> Subject: Re: Couch and Varnish To: "user@couchdb.apache.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable "In any case, when we're in "every 1 request to cache means 1 request to database" situation, "caching" is truly pointless. Not true. Consider attachments or view query results, checking that the cached result is still fresh is faster than redoing the work (or copying the attachment again). It's only (almost) pointless when fetching documents themselves. What improvement could be made here? It seems wrong to return a cached copy of a document without checking that it is fresh, and my read of 2616 says we mustn't. Sent from my iPad On 13 Nov 2010, at 16:36, "Karel Mina=C5=99=C3=ADk" wrote: > Hi, > > I am ashamed to reply so late, sorry, I got lost in other stuff on Monday= . I'll combine my replies: > > On Mon, Nov 8, 2010 at 08:17, Zachary Zolton w= rote: >>>>>> Of course, you'd be stuck with manually tracking the types of URLs t= o >>>>>> purged, so I haven't been too eager to try it out yet... > > Yes, that's precisely what I'd like to avoid. It's not _that_ hard of cou= rse, and Couch provides awesome entry point for the invalidation in _change= s or update_notifier, but still... > > On 9.Nov, 2010, at 24:42 , Robert Newson wrote: >> I think it's clear that caching via ETag for documents is close to >> pointless (the work to find the doc in the b+tree is over 90% of the >> work and has to be done for GET or HEAD). > > Yes. I wonder if there's any room for improvement on Couch's part. In any= case, when we're in "every 1 request to cache means 1 request to database"= situation, "caching" is truly pointless. > > On Mon, Nov 8, 2010 at 11:11 PM, Zachary Zolton wrote: >>> That makes sense: if every request to the caching proxy checks the >>> etag against CouchDB via a HEAD request=E2=80=94and CouchDB currently d= oes >>> just as much work for a HEAD as it would for a GET=E2=80=94you're not g= oing to >>> see an improvement. > > Yes. But that's not the only scenario imaginable. I'd repeat what I wrote= to the Varnish mailing list [http://lists.varnish-cache.org/pipermail/varn= ish-misc/2010-November/004993.html]: > 1. The cache can "accumulate" requests to a certain resource for a certai= n (configurable?) period of time (1 second, 1 minute, ...) and ask the back= end less often -- accelerating througput. > 2. The cache can return "possibly stale" content immediately and check wi= th the backend afterwards (on the background, when n-th next request comes,= ...) -- accelerating response time. > It was my impression, that at least the first option is doable with Varni= sh (via some playing with the grace period), but I may be severely mistaken= . > > On Mon, Nov 8, 2010 at 5:04 PM, Randall Leeds w= rote: >>>> If you have a custom caching policy whereby >>>> the proxy will only check the ETag against the authority (Couch) once >>>> per (hour, day, whatever) then you'll get a speedup. But if your proxy >>>> performs a HEAD request for every incoming request you will not see >>>> much performance gain. > > P-r-e-c-i-s-e-ly. If we can tune Varnish or Squid to not be so "dumb" and= check with the backend based on some configs like this, we could use it fo= r proper self-invalidating caching. (As opposed to TTL-based caching, which= bring the manual expiration issues discussed above.) Unfortunately, at lea= st based on the answers I got, this just not seems to be possible. > > On Mon, Nov 8, 2010 at 12:06, Randall Leeds wro= te >>>>> It'd be nice if the "Couch is HTTP and can leverage existing caches a= nd tools" >>>>> talking point truly included significant gains from etag caching. > > P-R-E-C-I-S-E-L-Y. This is, for me, the most important, and embarrassing = issue of this discussion. The O'Reilly book has it all over the place: http= ://www.google.com/search?q=3Dvarnish+OR+squid+site:http://guide.couchdb.org= . Whenever you tell someone who really knows about HTTP caches "Dude, Couc= h is HTTP and can leverage existing caches and tools" you can and will be l= aughed at -- you can get away with mentioning expiration based caching and = "simple" invalidation via _changes and such, but... Embarrassing still. > > I'll try to do more research in this area, when time permits. I don't bel= ieve there's _not_ some arcane Varnish config option to squeeze some perfor= mance eg. in the "highly concurrent requests" scenario. > > Thanks for all the replies!, > > Karel >