incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: Architectural advice requested, and some 1.0 suggestions.
Date Thu, 06 Nov 2008 04:45:53 GMT
On Wed, Nov 5, 2008 at 11:12 PM, Antony Blakey <antony.blakey@gmail.com> wrote:
>
> On Tue, Nov 4, 2008 at 6:47 PM, Antony Blakey <antony.blakey@gmail.com>
> wrote:
>
>> My solution is to use the notification mechanism to maintain (multiple)
>> SQLite databases containing the document keys I need to search over. In
>> the
>> URP example, I store (db, src, dest) records. I also store the last seqno,
>> so that I can do incremental updates to SQLite.
>>
>> Then, using _external, I can make queries of SQLite to get me either
>> (user,
>> permission) pairs using a self-join, or in the case of the arbitrary
>> metadata queries, a list of document ids.
>>
>> The primary difficulties I have with this simple model are:
>>
>> 1) The notification mechanism doesn't give me enough information.
>> Currently
>> I have to do a _all_docs_by_seq and check for deletions by attempting to
>> get
>> each document, which I have to do for every document in any case (unless I
>> use transparent id's) to determine if I'm interested in it, and then get
>> the
>> data. I presume this technique works because deletion is actually a form
>> of
>> update until compaction (I copied it from GeoCouch).
>>
>> ** SUGGESTION ** I would prefer an update interface that gave me a stream
>> of
>> (old, new) document pairs, which covers add/update/delete, plus a (from,
>> to)
>> seqno pair. Have a 'from' seqno lets me know when I have to trigger a full
>> rescan, which I need to do in a variety of circumstances such as
>> configuration change.
>
> In retrospect this is not a good idea. I think notification handlers should
> do nothing more than mark a view or query source as dirty, invalidate a
> cache such as memcached, and possibly check for mods to a config document to
> enable/disable the query service. The _external/plugin query handler should
> do the subsequence processing and update any private data structures or
> on-disk indexes, just as the map/reduce views do, and for the same reason.
> So I don't think the notification mechanism should be changed.
>
> However, that raises a question about external view/query server updating:
> should a view update to the seqnum that is current when the request is
> received, or should it keep looping in an update cycle until the seqnum has
> reached a steady state?
>
> The former would make sense if you just wanted the ensure that the view was
> up-to-date with records a client might have just written in the requesting
> thread, whilst the later would seem to potentially block forever depending
> on the amount of processing required to update the external view and the
> update rate.
>

I'm haven't considered all of the possible ramificaitons of what
information should be presented to update notifications processes, but
my current feelings in my rather tired state are in no particular
order:

1. Update notifications should at a bare minimum support DB
create/update/delete notifications. IIRC, create was missing but was a
minimal patch. Not sure if its been comited or not.
2. View resets may be an addition to the notifications
3. Following from 2, updates to a view may lend credence to having an
update that is "view updated to seq N"

All of those I could see as having particular use cases.

As to updating, if I'm not mistaken, views internally will update to
the latest sequence that was available when the update started.
(Thought being if they access the btree in one go, the consistent read
state would point at them not geting new updates till the next read
request. Not sure if incomming reads durring an update reset this
though)

When we index outside of erlang, we don't have the consistent
read-state guarantee if we page through the _all_docs_by_seq view as
per usual design pattern. We could read the entire thing view into
memory, but that has the obvious not-scalable side effect. Thus by
default all external indexers (external == accessing couchdb via http)
would have the second form of updating until they reach a steady
state. This in deed could lead to race conditions with an indexer
never managing to stay quite in sync.

> Finally, does anyone have advice about the merits of mnesia vs. sqlite (or
> another sql db) for this kind of auxiliary indexing?
>

I'd say it really depends on what you're wanting to accomplish. I for
awhile have contemplated the relative awesome/fail aspects of having
an mnesia layer that treated couchdb as its permanent store and
exported some sort of HTTP query api. I'm pretty sure I've convinced
myself it would be a not-general enough thing that would be worth
supporting. That being said, using it for a specific well defined roll
probably isn't out of the question.

As for the specifics of mnesia vs. sqlite I couldn't tell you. My
guess is that the mnesia integration would kick the crap out of the
sqlite integeration seeing as mnesia is part of the core library. But
you mentioned GIS stuff and I haven't a clue on mnesia support for
such things. Also, mnesia has that whole horizontal scale thing baked
in.

Hopefully that helps more than confounds,
Paul

> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Lack of will power has caused more failure than lack of intelligence or
> ability.
>  -- Flower A. Newhouse
>
>

Mime
View raw message