couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Bulk Load
Date Thu, 18 Sep 2008 13:49:47 GMT

On Sep 18, 2008, at 15:30, Ronny Hanssen wrote:

> I think I got it, kinda :). At least for a single node setup. How to  
> solve
> this in a multi node setup is beyond me, but I guess that some  
> patterns will
> emerge as people start using CouchDB. As for now CouchDB is still only
> running on a single node anyway. The calls to CouchDB are not  
> distributed
> that is. I know replication work, but only as a manual feature, right?

Hook it up to a cronjob or the DbUpdateNotification and it is automatic.
(c.f http://code.google.com/p/couchdb-python/source/browse/trunk/couchdb/tools/replication_helper.py)

Cheers
Jan
--
>
> Thanks,
> Ronny
>
> 2008/9/18 Jan Lehnardt <jan@apache.org>
>
>> HI Ronny,
>>
>> not sure what you are trying to achieve here?
>>
>> My solution is good for a single node instance, which is, if I
>> remember correctly what you asked for. It ignores the multi-node
>> setup and merging revisions over multiple nodes. Which is exactly
>> what CouchDB does for the simple reason that it is not easy to do :)
>>
>> Since you are manually handing the list of past revisions, you'd need
>> to do the history merge on a multi-node conflict on your own.
>>
>> Cheers
>> Jan
>> --
>>
>>
>> On Sep 18, 2008, at 03:35, Ronny Hanssen wrote:
>>
>> Hm.
>>>
>>> In Paul's case I am not 100% sure what is going on. Here's a use  
>>> case for
>>> two concurrent edits:
>>> * First two users get the original.
>>> * Both makes a copy which they save.
>>> This means that there are two fresh docs in CouchDB (even on a  
>>> single
>>> node).
>>> * Save the original using a new doc._id (which the copy is to  
>>> persist in
>>> copy.previous_version).
>>> This means that the two new docs know where to find their  previous
>>> versions. The problem I have with this scheme is that every change  
>>> of a
>>> document means that it needs to store not only the new version,  
>>> but also
>>> it's old version (in addition to the original). The fact that two  
>>> racing
>>> updates will generate 4(!) new docs in addition to the original  
>>> document
>>> is
>>> worrying. I guess Paul also want the original to be marked as  
>>> deleted in
>>> the
>>> _bulk_docs? But, in any case the previous version are now new two  
>>> new
>>> docs,
>>> but they look exactly the same, except for the doc._id, naturally...
>>>
>>> Wouldn't this be enough Paul?
>>> 1. old = get_doc()
>>> 2. update = clone(old);
>>> 3. update.previous_version = old._id;
>>> 4. post via _bulk_docs
>>>
>>> This way there won't be multiple old docs around.
>>>
>>> Jan's way ensures that for a view there is always only one current  
>>> version
>>> of a doc, since it is using the built-in rev-control. Competing  
>>> updates on
>>> the same node may fail which is then what CouchDB is designed to  
>>> handle.
>>> If
>>> on different nodes, then the rev-control history might come "out  
>>> of synch"
>>> via concurrent updates. How does CouchDB handle this? Which update  
>>> wins?
>>> On
>>> a single node this is intercepted when saving the doc. For  
>>> multiple nodes
>>> they might both get a response saying "save complete". So, these  
>>> then
>>> needs
>>> merging. How is that done? Jan further on secures the previous  
>>> version by
>>> storing the previous version as a new doc, allowing them to be  
>>> persisted
>>> beyond compaction. I guess Jan's sample would benefit nicely from
>>> _bulk_docs
>>> too. I like this method due to the fact that it allows only one  
>>> current
>>> doc.
>>> But, I worry about how revision control handles conflicts, Jan?
>>>
>>> Paul and my updated suggestion always posts new versions, not  
>>> using the
>>> revision system at all. The downside is that there may be multiple  
>>> current
>>> versions around... And this is a bit tricky I believe... Anyone?
>>>
>>> Paul's suggestion also keeps multiple copies of the previous  
>>> version. I am
>>> not sure why, Paul?
>>>
>>>
>>> Regards,
>>> Ronny
>>>
>>> 2008/9/17 Paul Davis <paul.joseph.davis@gmail.com>
>>>
>>> Good point chris.
>>>>
>>>> On Wed, Sep 17, 2008 at 11:39 AM, Chris Anderson  
>>>> <jchris@apache.org>
>>>> wrote:
>>>>
>>>>> On Wed, Sep 17, 2008 at 11:34 AM, Paul Davis
>>>>> <paul.joseph.davis@gmail.com> wrote:
>>>>>
>>>>>> Alternatively something like the following might work:
>>>>>>
>>>>>> Keep an eye on the specifics of _bulk_docs though. There have  
>>>>>> been
>>>>>> requests to make it non-atomic, but I think in the face of  
>>>>>> something
>>>>>> like this we might make non-atomic _bulk_docs a non-default or  
>>>>>> some
>>>>>> such.
>>>>>>
>>>>>
>>>>> I think the need for non-transaction bulk-docs will be obviated  
>>>>> when
>>>>> we have the failure response say which docs caused failure, that  
>>>>> way
>>>>> one can retry once to save all the non-conflicting docs, and  
>>>>> then loop
>>>>> back through to handle the conflicts.
>>>>>
>>>>> upshot: I bet you can count on bulk docs being transactional.
>>>>>
>>>>>
>>>>> --
>>>>> Chris Anderson
>>>>> http://jchris.mfdz.com
>>>>>
>>>>>
>>>>
>>


Mime
View raw message