couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Update notifications including update sequence
Date Mon, 19 Jan 2009 05:21:04 GMT
On Sun, Jan 18, 2009 at 11:56 PM, Antony Blakey <antony.blakey@gmail.com> wrote:
>
> On 19/01/2009, at 2:53 PM, Paul Davis wrote:
>
>> On Sun, Jan 18, 2009 at 10:51 PM, Antony Blakey <antony.blakey@gmail.com>
>> wrote:
>>>
>>> I've previously posted a solution using _external that doesn't hit couch
>>> every update, and that maintains MVCC consistency and lazy-update view
>>> behaviour.
>>>
>>
>> Right. I tried looking through mark mail for a link to your
>> implementation but came up empty handed. I'd contemplated something
>> similar as well. The issue though is that Lucene index writers are
>> AFAIK not reentrant.
>
> Thread 'couchdb' started by Tim Parkin around 20/21 December.
>

Odd. I only noticed that last 2 or 3 posts of that thread before.
Thanks for the tip.

> IndexWriters are mutexed using a lock file.
>

Ew.

>> Thus the headache of coordinating multiple random
>> processes would start to suck. Lots.
>
> My reading of the code was that there was a single process for each
> _external definition (although admittedly that was early in my understanding
> of gen_server). Major consistency issues result if requests to the _external
> aren't serialized.
>

There can be many _external processes for a single definition. So, not
only are requests not serialized, they can be concurrent etc.

>>> The problem with using notifications is lack of snapshot coordination
>>> between the update process and the external process.
>>>
>>
>> I'd say this is use case dependent.
>
> It does mean that you can't guarantee that an external request (that does
> reference a given MVCC snapshot) is getting data from the same snapshot.
>
> You're right that's use case dependent, but the issue is whether the use
> case is 'free text indexing' or is a client use case. If the later, then you
> need to handle the situation where it *does* matter, so an implementation
> that has random characteristics is IMO less than optimal.
>

Err, right. Its use case dependent. If your (client defined) use case
requires certain characteristics, the update_notifcation/_external
process may just not be the right tool for the job etc etc.

>>> The synchronisation between sequential _external calls is obvious e.g.
>>> guaranteeing that the _external process sees a monotonic increasing
>>> update_seq.
>>>
>>
>> I don't follow.
>
> I mean you'll never get a request in the context of an update_seq that your
> _external process has already advanced beyond, because the update_seqs seen
> by the external are a) serialized and b) only see a monotonic increasing
> sequence of update_seq values. Hence you can safely run an update process
> and set a 'last_update_seq_seen' (which is the key to avoiding hitting couch
> again) knowing that you never have to backtrack.
>

A single _external process should only see monotonically increasing
update_seq's. I think it's techincally possible to have a smaller
update_seq processed later in time in a different os process though
(later in time <= few ms).

The ideas from the other thread about having a UUID per db and
compaction are interesting, are either of those included the fs layout
stuff you were working on?

Paul

> Antony Blakey
> --------------------------
> CTO, Linkuistics Pty Ltd
> Ph: 0438 840 787
>
> Human beings, who are almost unique in having the ability to learn from the
> experience of others, are also remarkable for their apparent disinclination
> to do so.
>  -- Douglas Adams
>
>
>

Mime
View raw message