Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D25D811054 for ; Thu, 21 Aug 2014 00:46:25 +0000 (UTC) Received: (qmail 76614 invoked by uid 500); 21 Aug 2014 00:46:25 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 76554 invoked by uid 500); 21 Aug 2014 00:46:25 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 76527 invoked by uid 99); 21 Aug 2014 00:46:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2014 00:46:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.217.175] (HELO mail-lb0-f175.google.com) (209.85.217.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Aug 2014 00:46:20 +0000 Received: by mail-lb0-f175.google.com with SMTP id 10so7349304lbg.20 for ; Wed, 20 Aug 2014 17:45:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=w6UD3NT5l2ci4yhapYVAPvZFstVF9xJWHYdm1o9WGkE=; b=PNH7waPfGcPgbfj//cn29DcVCbEf3+aECcunKrsrk43OwcnsLlfIyrWmYAgSFWpA7w 0OC4DtX2yOVywxrLjqnJy6QTKuoIknBv7SF7+3xaxhIXoG/mKGTTd7kG52k9pmobv9to nUs/It824rn9Qksb0wC/k6aVGL4oCuhcy1sp0xEjLYBu4U82loABHfmPW+Fpw/RyfFv5 gZ0MC8stOSqTTHYspMYE0HDwL8dQj/QzJNj1b+l7pwAEnIVFhzuFwB1xwczSWQ4+yIk8 KSJKyz5pZ7WGeHtjFSna4OJi81g2B+E9EizXcFa9JvlKZ/uYAzTAZqXRixQwYyPFx8Pp prsA== X-Gm-Message-State: ALoCoQlOz2Eil4R+SrqSPxETpRBvp611fs8ojkX4prddWQxGcpD/7Q3LcuRvRpW/z6FYQ9wCW5QA MIME-Version: 1.0 X-Received: by 10.152.30.100 with SMTP id r4mr24104843lah.87.1408581958375; Wed, 20 Aug 2014 17:45:58 -0700 (PDT) Received: by 10.112.181.36 with HTTP; Wed, 20 Aug 2014 17:45:58 -0700 (PDT) X-Originating-IP: [172.56.32.1] Received: by 10.112.181.36 with HTTP; Wed, 20 Aug 2014 17:45:58 -0700 (PDT) In-Reply-To: References: Date: Wed, 20 Aug 2014 17:45:58 -0700 Message-ID: Subject: Re: [DISCUSS] Rewriting the CouchDB HTTP Layer From: Russell Branca To: dev@couchdb.apache.org, andywenk@apache.org Content-Type: multipart/alternative; boundary=089e0158cba0442a830501190bbc X-Virus-Checked: Checked by ClamAV on apache.org --089e0158cba0442a830501190bbc Content-Type: text/plain; charset=UTF-8 Thanks Andy! The next step I want to take is build another prototype in Cowboy and compare it with the web machine implementation. Hopefully will have some time for that over the weekend. -Russell On Aug 20, 2014 5:01 AM, "Andy Wenk" wrote: > Hey Russel, > > I have read your blog post about "Rewriting the CouchDB HTTP Layer". Thanks > a lot for that! > > As a note from a non-core-CouchDB-dev - this sounds great and very > reasonable. Making the code easier to test, removing unnecessary code > duplication, organising the code even better and making it easier to write > plugins are things, that will lead to better code and will make it easier > for devs to contribute. So all thumbs up! Great work! I hope the discussion > will lead to a good decision :) > > Cheers > > Andy > > > On 19 August 2014 02:27, Russell Branca wrote: > > > On Aug 17, 2014 8:15 PM, "Jason Smith" wrote: > > > > > > Hi, Russell. This is okay for a starting point but it is a bit vague. > > Could > > > you perhaps flesh out the plan and make it more comprehensive? > > > > > > ^^ That is a joke! > > > > > > Seriously, thank you very much for this analysis and plan. This is very > > > exciting! (Not least because the http codebase is the part I know best > > and > > > I can get excited about.) > > > > > > > Thanks! > > > > > One quick question that I don't see from your writeup: What version of > > > CouchDB are you thinking of targeting? 2.0? 2.1? 3.0? Is this > completely > > an > > > internal change, or does it affect users? > > > > > > > I think this is outside the scope of 2.0 given how close we are on that. > > There's a fair bit of legwork involved in doing the rewrite, so I > wouldn't > > want to block 2.0. So I think the question is 2.x or 3.0. If we go the > web > > machine route we should rework the api and definitely introduce backwards > > incompatible changes, so we would want to do a 3.0 release there. If we > > used cowboy we could mimic the current api and release 2.x. > > > > > For me, I am not so interested in an internal rewrite with zero > advantage > > > (besides "it's cleaner"), however I am am very interested to use the > > > rewrite for a better opportunity to explore plugin opportunities or > other > > > extensibility features. > > > > > > > > > > This rewrite needs to happen. The internals of the http layer need some > > serious love and there's a lot of duplication to remove. I think the big > > win here is that this would get the ball rolling on taking a closer look > at > > the various internal applications and figuring out what needs to be > > restructured. For instance, the next logical step after reworking the > http > > later is to standardize the clustered and local api modules, and there > was > > some great discussion in the Dev channel today about that. > > > > The more we can decouple the various apps, the more easily we can extend > > CouchDB with plugins and new functionality. > > > > -Russell > > > > > > > > > > > On Mon, Aug 18, 2014 at 1:41 AM, Russell Branca > > > > wrote: > > > > > > > # Rewriting the CouchDB HTTP Layer > > > > > > > > With the light at the end of tunnel on the BigCouch merge, I thought > > > > it was time to get the conversation going on cleaning up the current > > > > HTTP stack duality. We've got a good opportunity to do some major > > > > cleanup, remove duplication, and really start more clearly separating > > > > the various components of CouchDB. > > > > > > > > > > > > ## Primary objectives > > > > > > > > * Consolidate down to one HTTP layer > > > > * Isolate HTTP functionality > > > > * Separate HTTP server from HTTP resources > > > > * Easy plugin integration > > > > * Build clustered/local API > > > > > > > > > > > > ### Consolidate down to one HTTP layer > > > > > > > > We currently have two HTTP layers, `couch_httpd` and `chttpd`. This > > > > was a useful construct when BigCouch was a separate application where > > > > isolating the clustered layer from the local layer was necessary, and > > > > quite useful. > > > > > > > > This is no longer the case, and we can significantly reduce code > > > > duplication by consolidating down to one http layer. There are a > > > > number of places in the two apps where the code is nearly identical, > > > > except one calls out to `fabric` and the other calls out for > > > > `couch_*`. For instance, compare `couch_httpd_db:couch_doc_open/4` > [1] > > > > with `chttpd_db:couch_doc_open/4` [2]. These are completely identical > > > > aside from whether it goes through the clustered layer, `fabric`, or > > > > through the local layer `couch_db`. > > > > > > > > There are plenty of other places with similar duplication. This is > > > > obviously ripe with opportunity to refactor and introduce some higher > > > > level abstractions to make the HTTP layer function independently of > the > > > > document/database level APIs. > > > > > > > > > > > > ### Isolate HTTP functionality > > > > > > > > I don't think `couch_doc_open/4` has any business existing in > > > > the HTTP layer, we should move all non HTTP logic out. IMO the HTTP > > > > layer should only concern itself with: > > > > > > > > 1. Receiving the HTTP requests > > > > 2. Extracting out the request data into a standard data structure > > > > 3. Dispatch requests to the appropriate internal APIs > > > > 4. Forward the response > > > > > > > > Anything that doesn't fit into those four steps should be ripped out > > > > and moved elsewhere. For instance, the primary logic for determining > > the > > > > database redundancy and shard values is done in `chttpd_db` [3]. I > > > > would greatly prefer to see this logic in a database API. > > > > > > > > The more we can isolate HTTP logic from database logic the > > > > better. Once they are fully decoupled, then the HTTP layer is merely > > > > one particular client interface on top of the core database. We also > > > > get all the benefits of isolation for testing and what not. > > > > > > > > Along these lines, I think we greatly overuse the #http{} record for > > > > passing around request data, and instead you extract the body, and > > > > then combine all of the user supplied headers and query string params > > > > into a standard options list. This we can we completely separate > > > > making database requests from the representation of the client > > > > request. > > > > > > > > > > > > ### Separate HTTP server from HTTP resources. > > > > > > > > I think everything I've said so far is pretty clear cut in terms of > > > > it's _the_ logical thing to do, but separating the HTTP server from > > > > the HTTP endpoints is less clearly defined. However, we do have > > > > precedence for this and there are a number of solid benefits. > > > > > > > > First, let me explain what I mean here. There are two pieces to an > > > > HTTP stack, first there's the core HTTP engine that handles receiving > > > > and responding to requests and other things along those lines, and > > > > second there's the places where you supply your business logic and > > > > figure what content to send to the user. > > > > > > > > CouchDB has a handful of places using this aproach, where instead of > > > > defining all the logic in the HTTP stack directly, we have auxilary > > > > modules defined within the appropriate applications that specify how > > > > any HTTP requests for that application are handled. A good clean > > > > example of this approach is `couch_mrview_http` [4]. > > > > > > > > > > > > ### Easy plugin integration > > > > > > > > One big advantage of the above separation of HTTP resources is that > it > > > > provides a standard way of plugins hooking in new HTTP endpoints. The > > > > more we can treat the "core" CouchDB applications as plugins, the > more > > > > easily it is to isolate and replace various parts of the stack. > > > > > > > > > > > > ### Build clustered/local API > > > > > > > > The above example of `couch_doc_open/4` is a clear cut case where > > > > we want to abstract the process of loading a document. Not all places > > > > are as easily abstractable, but this is a great example of why I > think > > > > we should have a standard API on top of clustered and local layers, > > > > where deciding which to use is based on a local/clustered flag, or > > > > some other heuristic. > > > > > > > > I've been toying around with the idea of making a request object of > > > > some sort, is something like `couch_req:make(ReqBody, ReqOptions)` > > > > that you can then pass to `couch_doc_api` or some such, but I don't > > > > have any strong opinions on this. > > > > > > > > > > > > ## Where I've gotten so far: chttpd2, a proof of concept > > > > > > > > I've hacked out an experimental WebMachine [5] based rewrite of the > > > > HTTP stack called `chttpd2` [6]. This PoC follows the same ideas I've > > > > outlined above, so I'll run back through the previous outlined items > > > > and explain how `chttpd2` handles it. > > > > > > > > > > > > ### Consolidate down to one HTTP layer > > > > > > > > Right now I'm not doing anything special here, I still think building > > > > an API layer that handles deciding whether to make a clustered or > > > > local request is the proper approach, so I've not included any logic > > > > in the HTTP stack for doing so. > > > > > > > > > > > > ### Isolate HTTP functionality > > > > > > > > I've got a solid separation of functionality in `chttpd2`. If you > > > > notice the current codebase in [6], there is zero logic for actually > > > > handling any particular CouchDB requests. Rather those are self > > > > contained within the appropriate sub applications. I've started this > > > > for `couchdb-couch` [7] and `couchdb-config` [8]. Here's a simple > > > > example of the new welcome resource [9]. > > > > > > > > As you can see, there is zero database logic in the welcome request > > > > module. In fact, I started moving all the random logic in the current > > > > HTTP layer to a temporary module I'm calling `couch_api` [10]. As you > > > > can see from that module, it removes all the logic that was > previously > > > > nested in `couch_httpd_misc_handlers` [11]. More complicated examples > > > > for creating a database and viewing database info are in [12], and an > > > > all dbs example is in [13]. Also I've done similar things for > > > > `couchdb-couch` as mentioned above in [8]. > > > > > > > > > > > > ### Easy plugin integration > > > > > > > > As I mentioned above, by making it easy to plugin in new HTTP > > > > endpoints, we also make it easier for plugins to do the same. On that > > > > front I've made it so each application can optionally declare a > > > > `couch_dispatch` function describing what endpoints it can handle, > and > > > > then `chttpd2` will go and find all of those to figure out how to > > > > dispatch requests [14]. And for example, here's how the > > > > `couchdb-couch` endpoints are declared [15]. > > > > > > > > > > > > ### Build clustered/local API > > > > > > > > I have not started on this front, and have only built these endpoints > > > > for interacting with the clustered layer for simplicity as this is > > > > just a proof of concept I hacked together. However, as I mentioned > > > > above I've started moving all the logic out of the HTTP layer into > > > > more appropriate places. I've made similar changes to `couch-config` > > > > by moving all of the logic from [16] into the `couch-config` > > > > application itself. > > > > > > > > > > > > ### Why WebMachine? > > > > > > > > I find WebMachine [5] to be one of the more interesting HTTP stacks > for > > > > building webapps. In particular I like how they have a specific flow > > > > chart [17] and coordinate point corresponds to a particular > definition > > > > of the `webmachine_decision_core:decision/1` function. > > > > > > > > That said I think Cowboy [19] has more momentum and might be a better > > > > long term project to tie ourselves too. > > > > > > > > Also, if we decide to go the WebMachine route, we'll need to > > > > restructure a fair bit of the current HTTP layer, making a number of > > > > breaking changes. I'm a strong -1 for coercing WebMachine into the > > > > current haphazard CouchDB API. WebMachine is very opinionated on how > > > > you structure your API (for good reason!) and I think going against > > > > that is a mistake. > > > > > > > > So if we wanted to just do a drop in replacement of the current > > > > CouchDB API, then Cowboy is the way to go. Although one of these days > > > > we should clean up the HTTP API. > > > > > > > > > > > > # Conclusion > > > > > > > > I hope this can start a good discussion on a game plan for the HTTP > > > > layer. Like I said, this is a proof of concept that I hacked out, so > > > > I'm not attached to the code or the use of WebMachine, but I do think > > > > it's a good representation of the ideas outlined above. > > > > > > > > Looking forward to hearing your thoughts and comments! > > > > > > > > > > > > > > > > #### Footnotes > > > > > > > > [1] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/master/src/couch_httpd_db.erl#L805-L823 > > > > > > > > [2] > > > > > > > > > https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L886-L904 > > > > > > > > [3] > > > > > > > > > https://github.com/apache/couchdb-chttpd/blob/master/src/chttpd_db.erl#L203-L205 > > > > > > > > [4] > > > > > > > > > https://github.com/apache/couchdb-couch-mrview/blob/master/src/couch_mrview_http.erl > > > > > > > > > > > > [5] https://github.com/basho/webmachine > > > > > > > > [6] https://github.com/chewbranca/chttpd2/tree/initial-branch > > > > > > > > [7] > > > > > > > > > https://github.com/apache/couchdb-couch/tree/2073-feature-webmachine-http-engine > > > > > > > > [8] > > > > > > > > > https://github.com/apache/couchdb-config/tree/2073-feature-webmachine-http-engine > > > > > > > > [9] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/2073-feature-webmachine-http-engine/src/couch_httpr_welcome.erl > > > > > > > > [10] > > > > > > > > > > > > > https://github.com/apache/couchdb-couch/blob/2073-feature-webmachine-http-engine/src/couch_api.erl > > > > > > > > [11] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/master/src/couch_httpd_misc_handlers.erl#L32-L45 > > > > > > > > [12] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/2073-feature-webmachine-http-engine/src/couch_httpr_db.erl > > > > > > > > [13] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/2073-feature-webmachine-http-engine/src/couch_httpr_dbs.erl > > > > > > > > [14] > > > > > > > > > https://github.com/chewbranca/chttpd2/blob/initial-branch/src/chttpd2_config.erl#L26-L33 > > > > > > > > [15] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/2073-feature-webmachine-http-engine/src/couch.erl#L68-L73 > > > > > > > > [16] > > > > > > > > > https://github.com/apache/couchdb-couch/blob/master/src/couch_httpd_misc_handlers.erl#L155-L249 > > > > > > > > > > > > [17] > > > > > > > > > https://raw.githubusercontent.com/basho/webmachine/develop/docs/http-headers-status-v3.png > > > > > > > > [18] > > > > > > > > > https://github.com/basho/webmachine/blob/develop/src/webmachine_decision_core.erl#L158-L595 > > > > > > > > [19] https://github.com/ninenines/cowboy > > > > > > > > > > > > P.S. I've decided to stop using gists.github.com for posting > content, > > > > as I can never find my posts again and the comments there are a black > > > > hole. I've instead posted this at: > > > > > > > http://www.chewbranca.com/tech/2014/08/17/rewriting-the-couchdb-http-layer/ > > > > > > > > > > -- > Andy Wenk > Hamburg - Germany > RockIt! > > GPG fingerprint: C044 8322 9E12 1483 4FEC 9452 B65D 6BE3 9ED3 9588 > > https://people.apache.org/keys/committer/andywenk.asc > --089e0158cba0442a830501190bbc--