incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ronny Hanssen" <super.ro...@gmail.com>
Subject Re: Bulk Load
Date Thu, 18 Sep 2008 13:30:51 GMT
I think I got it, kinda :). At least for a single node setup. How to solve
this in a multi node setup is beyond me, but I guess that some patterns will
emerge as people start using CouchDB. As for now CouchDB is still only
running on a single node anyway. The calls to CouchDB are not distributed
that is. I know replication work, but only as a manual feature, right?
Thanks,
Ronny

2008/9/18 Jan Lehnardt <jan@apache.org>

> HI Ronny,
>
> not sure what you are trying to achieve here?
>
> My solution is good for a single node instance, which is, if I
> remember correctly what you asked for. It ignores the multi-node
> setup and merging revisions over multiple nodes. Which is exactly
> what CouchDB does for the simple reason that it is not easy to do :)
>
> Since you are manually handing the list of past revisions, you'd need
> to do the history merge on a multi-node conflict on your own.
>
> Cheers
> Jan
> --
>
>
> On Sep 18, 2008, at 03:35, Ronny Hanssen wrote:
>
>  Hm.
>>
>> In Paul's case I am not 100% sure what is going on. Here's a use case for
>> two concurrent edits:
>>  * First two users get the original.
>>  * Both makes a copy which they save.
>> This means that there are two fresh docs in CouchDB (even on a single
>> node).
>>  * Save the original using a new doc._id (which the copy is to persist in
>> copy.previous_version).
>> This means that the two new docs know where to find their  previous
>> versions. The problem I have with this scheme is that every change of a
>> document means that it needs to store not only the new version, but also
>> it's old version (in addition to the original). The fact that two racing
>> updates will generate 4(!) new docs in addition to the original document
>> is
>> worrying. I guess Paul also want the original to be marked as deleted in
>> the
>> _bulk_docs? But, in any case the previous version are now new two new
>> docs,
>> but they look exactly the same, except for the doc._id, naturally...
>>
>> Wouldn't this be enough Paul?
>> 1. old = get_doc()
>> 2. update = clone(old);
>> 3. update.previous_version = old._id;
>> 4. post via _bulk_docs
>>
>> This way there won't be multiple old docs around.
>>
>> Jan's way ensures that for a view there is always only one current version
>> of a doc, since it is using the built-in rev-control. Competing updates on
>> the same node may fail which is then what CouchDB is designed to handle.
>> If
>> on different nodes, then the rev-control history might come "out of synch"
>> via concurrent updates. How does CouchDB handle this? Which update wins?
>> On
>> a single node this is intercepted when saving the doc. For multiple nodes
>> they might both get a response saying "save complete". So, these then
>> needs
>> merging. How is that done? Jan further on secures the previous version by
>> storing the previous version as a new doc, allowing them to be persisted
>> beyond compaction. I guess Jan's sample would benefit nicely from
>> _bulk_docs
>> too. I like this method due to the fact that it allows only one current
>> doc.
>> But, I worry about how revision control handles conflicts, Jan?
>>
>> Paul and my updated suggestion always posts new versions, not using the
>> revision system at all. The downside is that there may be multiple current
>> versions around... And this is a bit tricky I believe... Anyone?
>>
>> Paul's suggestion also keeps multiple copies of the previous version. I am
>> not sure why, Paul?
>>
>>
>> Regards,
>> Ronny
>>
>> 2008/9/17 Paul Davis <paul.joseph.davis@gmail.com>
>>
>>  Good point chris.
>>>
>>> On Wed, Sep 17, 2008 at 11:39 AM, Chris Anderson <jchris@apache.org>
>>> wrote:
>>>
>>>> On Wed, Sep 17, 2008 at 11:34 AM, Paul Davis
>>>> <paul.joseph.davis@gmail.com> wrote:
>>>>
>>>>> Alternatively something like the following might work:
>>>>>
>>>>> Keep an eye on the specifics of _bulk_docs though. There have been
>>>>> requests to make it non-atomic, but I think in the face of something
>>>>> like this we might make non-atomic _bulk_docs a non-default or some
>>>>> such.
>>>>>
>>>>
>>>> I think the need for non-transaction bulk-docs will be obviated when
>>>> we have the failure response say which docs caused failure, that way
>>>> one can retry once to save all the non-conflicting docs, and then loop
>>>> back through to handle the conflicts.
>>>>
>>>> upshot: I bet you can count on bulk docs being transactional.
>>>>
>>>>
>>>> --
>>>> Chris Anderson
>>>> http://jchris.mfdz.com
>>>>
>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message