Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2236BFFF2 for ; Mon, 25 Mar 2013 09:48:50 +0000 (UTC) Received: (qmail 37399 invoked by uid 500); 25 Mar 2013 09:48:49 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 37294 invoked by uid 500); 25 Mar 2013 09:48:49 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 37270 invoked by uid 99); 25 Mar 2013 09:48:48 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 09:48:48 +0000 Received: from localhost (HELO mail-la0-f44.google.com) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 09:48:47 +0000 Received: by mail-la0-f44.google.com with SMTP id eb20so10990912lab.17 for ; Mon, 25 Mar 2013 02:48:46 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.9.134 with SMTP id z6mr5661487lba.72.1364204925988; Mon, 25 Mar 2013 02:48:45 -0700 (PDT) Received: by 10.112.25.201 with HTTP; Mon, 25 Mar 2013 02:48:45 -0700 (PDT) In-Reply-To: References: Date: Mon, 25 Mar 2013 09:48:45 +0000 Message-ID: Subject: Re: Input validation and limits From: Robert Newson To: "dev@couchdb.apache.org" Content-Type: text/plain; charset=ISO-8859-1 I'll quibble a little over the notion that 'middleware' can occur midway through request processing at the backend but, in general, yes. My point was to take Jason's suggestion head on and attempt to achieve consensus on what CouchDB should include versus exclude. I should have started a new thread for that rather than immediately forking this one. To answer the specific 'should we add X?' question it seemed prudent to ask the general 'what features are appropriate for couchdb?' question. To cover your list, in brief, I'd say vhosting, rewriting, throttling, ip checking are out and authentication and captcha are in, but that's just my list. This thread should either be renamed if we think the discussion is about the general, or we should all stay on topic (myself included, of course) and discuss the rate-limiting and captcha question. I think rate-limiting is out of scope but that captcha is in scope (because authentication in general is in scope). Is captcha technology something that evolves quite quickly? Would support today be something that our new quarterly updates could usefully keep pace with? B. On 25 March 2013 09:29, Benoit Chesneau wrote: > On Mon, Mar 25, 2013 at 9:11 AM, Robert Newson wrote: >> This is a great topic and one that goes to the heart of CouchDB's twin >> roles as database and web server. >> >> Does CouchDB need to directly support every feature that a web server >> ought to support? Or does CouchDB, by virtue of speaking HTTP, get to >> stay lean, providing only what must be provided by an origin server in >> the modern Web, and rely on other, hopefully solid and focused tools, >> for everything else? Supporting CAPTCHA, in whatever form, seems quite >> reasonable. It's an extension of our auth model in many respects and >> something that can't easily be externalized. >> >> CouchDB's strength is that it's a database that speaks HTTP. In my >> mind, it does that for one reason - to integrate with other things >> that also speak HTTP. That obviously includes browsers but it also >> includes load balancers, caching proxies, and so on. >> >> To the topic at hand I feel that rate limiting and IP blocking is >> something best done externally, just as I feel about virtual hosting >> and URL rewriting. Are our log files rich enough to power fail2ban >> itself? Could they be enhanced if not? Would an iptables approach to >> rate limiting be preferable? Can we, as the CouchDB developer >> community, really support and maintain all the extra features if we >> decided CouchDB-as-a-web-server means it ought to do all these things? >> Will we work to make a clustered CouchDB work without external load >> balancers or DNS failover services, to pick just two examples? Will we >> add an http caching layer? >> >> I sound opinionated and entrenched when I ask too many questions in a >> row, but they are sincere questions; it's not my intention to bludgeon >> the proposal into the ground with them. I do want to explicitly reject >> an accusation of "stop energy" before it's made, though. That phrase >> is easily invoked though I do see that it's often been true in the >> past, from myself and other developers. >> >> Adding this kind of statefulness seems inappropriate to me but it's >> hard to argue the case when we have the URL rewriting and virtual >> hosting built in. A separate conversation is looming about virtual >> hosting because the Nebraska merge that brings clustering will not >> bring virtual hosting with it; BigCouch has never supported native >> virtual hosting, it's provided by HAProxy instead. >> >> I would love a broader discussion about where CouchDB ends and other >> software begins. Is there a crisp line? I'd argue there could be, >> though it's not crisp today. For me, as I've said, CouchDB is a >> database that you talk to over HTTP. I'm for keeping that as lean as >> possible; that's a big enough task already. >> >> B. >> >> > > I'm on your side imo. Since a long time I'm thinking we should rewrite > the way couchdb handle authentication, vhosting & other HTTP related > stuff. What about refactor the HTTP level to use some kind of > middleware systems to validate or transform the request and response: > > 1. Accept req > 2. for middleware in request middleware: do something with req. if > needed return a response > 3. return response > 4. for middleware in response middleware: do something with response > > So the authentication, vhosting, rewriting and possibly other > middleware like throttling, ip checking, ... could be added easily or > even removed when not needed. Something equivalent like mod_* in > apache , so couchdb could offer some by default and let other vendors > to ship the one they built for a specific case. > > Thoughts? > > - benoit