couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Klaus Trainer <>
Subject Re: [DISCUSS] Deprecating _externals?
Date Wed, 29 Oct 2014 01:25:01 GMT
Hi Joan,

one reason why I've never missed an "official" plugin API is that
CouchDB provides the externals API, which is documented and works well.

Please see my further comments inline.

> Today, someone came to the #couchdb channel asking about
> _externals. For a long while it's been on my mind that perhaps
> we should deprecate the entire _externals feature for a number
> of reasons:
>   1. Couch is not a great reverse proxy. Making it into one is
>      as hard as rewriting nginx or haproxy in erlang. It's a
>      distraction to our development team and far outside our
>      core competency.

From my user perspective, it's a *good enough* reverse proxy.  It does
its job of forwarding HTTP requests and returning them.  Beyond that,
there is no other reverse proxy implementation that I know of that has a
RESTful HTTP API that can be used to start and stop services during
runtime and on top of that can be used as a central storage for storing
configuration of services.  I don't want to assert that it's always a
good idea to use that combination of features in place of other, more
common solutions, but there are scenarios where this can provide some
significant advantages.

Also, can you explain in what way doing HTTP and managing external OS
processes is "far outside of our core competency" while both are
actually essential to CouchDB's core feature set?

>   2. In a clustered CouchDB (the default in 2.0), the
>      assumptions around externals change drastically. For an 
>      _external to work, it must be stateless and not rely upon
>      multiple sequential requests to hit the same node (assuming
>      the standard n-node cluster + a load balancer/reverse proxy
>      at the front.)

Maybe I'm missing some aspects, but I can't see how "the assumptions
around externals change drastically".  We're using a stateless protocol,
and we've never made any guarantees with regard to people's application
state.  I can only see a straw man here.

>      People who wrote a CouchDB 1.x external could reasonably
>      expect to write an old-school singleton app (i.e., the only
>      copy of that external process running, on a single machine).
>      If they engaged in any of a number of bad behaviours for
>      distributed systems - storing content on local disk, locking
>      or blocking connections to other services/databases in a 
>      "single-threaded" pattern, or even expecting CouchDB not to
>      possibly introduce a conflict or "read your writes" - they
>      will probably fail outright at best, or at worst introduce
>      subtle and confusing behaviour.

Even for an "old-school singleton" web app it's common best practice to
put any application state that's beyond a request/response cycle into
the database and nowhere else.  Assuming you follow that best practice,
I can't see any new problem when it comes to running multiple instances
of such an app in parallel.  Regarding missing "read your writes"
guarantee and possible conflicts: these are database and not application
properties.  The related problems affect any client, and they are not
specific to using the externals API at all!  Also, they are not new
insofar as these problems exist already today as soon as replication
comes into play.

> TL;DR: We're changing the contract we give to _externals in a
> reverse-compatibility-breaking way. We either need to document it
> straight up, along with all of the admonishments required for
> people who expect it to operate the same as in 1.x, or we need to
> remove it.

What contract are you talking about?  Unless you have something
specific, I will assume that possible changes in semantics, which will
only occur in combination with some application-specific behaviour
anyway, are already dealt with a major version number increment.

> My opinion is that now that the default CouchDB rollout will be
> a cluster with a reverse proxy, that _externals should be exposed
> through the load balancer, which can then reference 1 or more
> processes distributed either on the same CouchDB nodes, or on
> different hosts should compute needs demand it.
> The exception here would be a single-node CouchDB, which could
> still use the same approach. However I don't see the issue with
> deploying an haproxy on that same node and using the same approach
> I describe above.

I do see an issue especially with regard to small applications that
suddenly need an additional piece of infrastructure that needs to be
configured and maintained.  Maybe I'm just naive, but I can't see the
large burden you seem to suggest that would justify to remove that
feature.  That is, I *am* willing to accept a tradeoff if it seems
worth it, but in this case I'm not quite convinced.


View raw message