incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: joins, reprise, and a suggestion for _external enhancement.
Date Tue, 11 Nov 2008 22:41:40 GMT

On 12/11/2008, at 7:02 AM, Dean Landolt wrote:

> On Tue, Nov 11, 2008 at 3:08 PM, Paul Davis <paul.joseph.davis@gmail.com 
> >wrote:
>
>> I think this is an interesting idea, and has mostly been done with
>> client libraries. ATM, I'm leaning towards saying that this is a
>> client extension and doesn't really belong in couch. There are a crap
>> load of optimizations that clients could make that couch couldn't.
>>
>> I have some ideas running around in my head about doing object graph
>> loading etc. Things really start to get fun on the client when you
>> contemplate referencing other databases etc.
>>
>> Anyway, if you can come up with some part of this functionality that
>> *must* be done server side and has a big enough use case, ideas for
>> patches are always welcome :D
>
>
> I didn't think of it that way, but I agree. Perhaps a querytools  
> plugin
> would be in order when the plugin system lands, but this is probably  
> best
> left to the client. Does anybody know if jquery.couch does something  
> like
> this? If not, I'll have a go at hacking it in.

I've written a plugin (in Erlang) that allows joins by tracking  
updates and replicating keys to Mnesia. It has the same behaviour as  
the existing map-reduce views e.g. updates only on view request. You  
can just as easily use e.g. SQLite rather than Mnesia.

I did this to allow arbitrary join queries against my model without  
doing it on the client, and to centralise caching for high-performance  
joins such as transitive User->Permission checks when using a User/ 
Role/Permission model that doesn't want both relationships to be  
stored on the Role object.

I can separately compile and deploy this using the simple change to  
bin/couchdb that I described in an earlier post. It involves no change  
to the rest of CouchDB.

*** Suggestion ***

Now that I've done it, I realize that it's overkill, and I'm  
abandoning that approach, not only because very few people are going  
to want to do this in Erlang, but also because the same effect can be  
gained using _external. The single optimization required is for a  
request coming through _external to carry the db seqno for the request  
(presuming the _external request is qualified by a db). This allows  
you to easily avoid updating your external index, without making any  
request if there have been no changes since the last request. This is  
important because it's a performance hit if you have to make an  
additional HTTP request to CouchDB on every view request.

You should also use both a startkey and an endkey in the  
_all_docs_by_seq request, which in effect gives the same semantics as  
map-reduce views i.e. you don't get into any race conditions during  
updating because you see one particular MVCC snapshot.
You can synthesize this approach by simultaneously using a  
notification listener and arranging for that process to talk to your  
_external handler, but given that the Erlang endpoint has easy access  
to the seqno, why not supply it to the external process, and avoid the  
hassle. Listening for notifications is necessary if you are going to  
use something like memcached, or some query mechanism that doesn't go  
through CouchDB (and you want to avoid requesting the db seqno on  
every request).

Maybe a different external that enforces a db qualifier, such as  
_external_view, would be appropriate. There are further optimisations  
I considered, such as enriching the _external protocol to allow the  
external process to perform the _all_docs_by_seq request (and  
subsequent document GETs) over the port, but on reflection the minimal  
change is preferable.

Finally, by not doing this in Erlang it is easier to conceive of a  
mechanism that allows you to deploy the _external handler via CouchDB  
itself. Doing the plugin in Erlang introduces the Subject/Object  
problem, which will it can be overcome, is a lot more fraught IMO.  
Paul suggested this for Erlang plugins, but I think it's more  
applicable to this scenario.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Did you hear about the Buddhist who refused Novocain during a root  
canal?
His goal: transcend dental medication.



Mime
View raw message