couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <>
Subject Re: joins, reprise, and a suggestion for _external enhancement.
Date Tue, 11 Nov 2008 22:59:36 GMT
On Tue, Nov 11, 2008 at 5:41 PM, Antony Blakey <> wrote:
> On 12/11/2008, at 7:02 AM, Dean Landolt wrote:
>> On Tue, Nov 11, 2008 at 3:08 PM, Paul Davis
>> <>wrote:
>>> I think this is an interesting idea, and has mostly been done with
>>> client libraries. ATM, I'm leaning towards saying that this is a
>>> client extension and doesn't really belong in couch. There are a crap
>>> load of optimizations that clients could make that couch couldn't.
>>> I have some ideas running around in my head about doing object graph
>>> loading etc. Things really start to get fun on the client when you
>>> contemplate referencing other databases etc.
>>> Anyway, if you can come up with some part of this functionality that
>>> *must* be done server side and has a big enough use case, ideas for
>>> patches are always welcome :D
>> I didn't think of it that way, but I agree. Perhaps a querytools plugin
>> would be in order when the plugin system lands, but this is probably best
>> left to the client. Does anybody know if jquery.couch does something like
>> this? If not, I'll have a go at hacking it in.
> I've written a plugin (in Erlang) that allows joins by tracking updates and
> replicating keys to Mnesia. It has the same behaviour as the existing
> map-reduce views e.g. updates only on view request. You can just as easily
> use e.g. SQLite rather than Mnesia.
> I did this to allow arbitrary join queries against my model without doing it
> on the client, and to centralise caching for high-performance joins such as
> transitive User->Permission checks when using a User/Role/Permission model
> that doesn't want both relationships to be stored on the Role object.
> I can separately compile and deploy this using the simple change to
> bin/couchdb that I described in an earlier post. It involves no change to
> the rest of CouchDB.
> *** Suggestion ***
> Now that I've done it, I realize that it's overkill, and I'm abandoning that
> approach, not only because very few people are going to want to do this in
> Erlang, but also because the same effect can be gained using _external. The
> single optimization required is for a request coming through _external to
> carry the db seqno for the request (presuming the _external request is
> qualified by a db). This allows you to easily avoid updating your external
> index, without making any request if there have been no changes since the
> last request. This is important because it's a performance hit if you have
> to make an additional HTTP request to CouchDB on every view request.
> You should also use both a startkey and an endkey in the _all_docs_by_seq
> request, which in effect gives the same semantics as map-reduce views i.e.
> you don't get into any race conditions during updating because you see one
> particular MVCC snapshot.
> You can synthesize this approach by simultaneously using a notification
> listener and arranging for that process to talk to your _external handler,
> but given that the Erlang endpoint has easy access to the seqno, why not
> supply it to the external process, and avoid the hassle. Listening for
> notifications is necessary if you are going to use something like memcached,
> or some query mechanism that doesn't go through CouchDB (and you want to
> avoid requesting the db seqno on every request).

Adding the seq num to the external protocol would be trivial and
you've made more than enough of a case for it IMO.

> Maybe a different external that enforces a db qualifier, such as
> _external_view, would be appropriate. There are further optimisations I
> considered, such as enriching the _external protocol to allow the external
> process to perform the _all_docs_by_seq request (and subsequent document
> GETs) over the port, but on reflection the minimal change is preferable.

I contemplated something of this nature too. But it seems like it'd be
a PITA to get right as well as lead to fairly little benefit. In the
future if we start hitting a huge penalty for things like action
scripts getting bogged down in pure connection overhead it might be an
optimization, but for the moment I don't think it should be a concern.

> Finally, by not doing this in Erlang it is easier to conceive of a mechanism
> that allows you to deploy the _external handler via CouchDB itself. Doing
> the plugin in Erlang introduces the Subject/Object problem, which will it
> can be overcome, is a lot more fraught IMO. Paul suggested this for Erlang
> plugins, but I think it's more applicable to this scenario.

I'm still grappling over this problem. I'm kinda more or less thinking
that given enough time, the easiest most bestest way to distribute
plugins is going to either be via javascript, or erlang once the
erlang infrastructure is set up. I should note that I see this part of
the infrastructure way down the road. As in, I have a shimmering
vision of something like firefox's add-on support where the program
itself can download and install new plugins etc. Obviously its gonna
be a long ass time before we can tell people to just download the XYZ
Plugin from the Pillow Factory.

> Antony Blakey
> -------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
> Did you hear about the Buddhist who refused Novocain during a root canal?
> His goal: transcend dental medication.


View raw message