couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: rep_security merge to trunk
Date Wed, 11 Mar 2009 23:51:24 GMT

On Mar 11, 2009, at 7:07 PM, Chris Anderson wrote:

> On Wed, Mar 11, 2009 at 8:34 AM, Damien Katz <damien@apache.org>  
> wrote:
>>
>> On Mar 10, 2009, at 7:06 PM, Chris Anderson wrote:
>>
>>> On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <damien@apache.org>  
>>> wrote:
>>>>
>>>> This patch breaks the file format and replication API, so  
>>>> replication
>>>> with
>>>> earlier versions is not possible.
>>>
>>> The rev format has changed. Does this mean that migrating existing
>>> data will involve getting each doc from oldDB, stripping the _rev,  
>>> and
>>> loading it into newDB?
>>
>> Yes, but it should be possible to convert the revs to the new  
>> format too.
>> But why?
>>
>>>
>>> It should be pretty straightforward to write a Python or Ruby script
>>> that does this in bulk to transfer docs. It's essentially a  
>>> version of
>>> the python dump / load tools that doesn't require putting the  
>>> whole db
>>> on disk as an intermediary.
>>>
>>> I'll volunteer but I wonder how I should handle docs with  
>>> conflicts in
>>> the oldDB?
>>
>> Oh that's why. Using the replicator API would work for that.
>>
>
> A little confused as to the plan here. Let me try to articulate:
>
> Write a script that pulls all_docs_by_seq from the old version of
> CouchDB in batches of 1000, and for each doc loads the head rev (and
> any conflict revs) into memory.
>
> Then it creates a bulk_docs POST for those docs, by stripping the rev
> from any docs that don't have conflicts, and any docs that have
> conflicts, creating a series of revs like this (pretend there are 199
> conflict revs)
>
> 1-sdfjhgsaf
> 2-asdfkjsad
> ..
> 199-asdf7tsfd
>
> and applying the revs to each doc in the conflict set. Does the rev
> ordering matter? Assuming I don't reuse the prefix number, does the
> format/length of the second rev part matter?
>
> Then using a normal POST of an object like {"docs":[...array of
> docs...]} to the /db/_bulk_docs URL (with no special query option),
> the new docs (and conflict revs) will get stored in the new DB?
>
> Or do I need to assign well-formed made up revs to the non-conflicting
> docs (they'd all get "1-foobar") and use the ?new_edits=false option
> on the bulk_docs POST ?



To use the new_edits=false, you have to specify a rev history in a doc  
_revisions property, like this:
{new_edits:false,
  docs:[
     {_id:"foo", _revisions={start:2,ids:["133457546","475133454"]} }
     ]}

The ids are the rev ids without the leading offset, the are send this  
way for efficiency. Converting to regular revs, they would look like  
"2-133457546" and "1-475133454".

For importing existing docs, I think you could just use the  
all_or_nothing:true option and save the multiple copies of the same  
documents and they'll all be saved, and you don't have to worry about  
the _revisions stuff.

-Damien

>
> I think getting this clear on the list will help everyone's
> understanding of the new bulk_docs semantics. (I don't plan to include
> in my migrator the ability to transfer any docs which would be lost on
> the source DB during compaction... only the HEAD rev and any conflicts
> will be transfered.)
>
> Chris
>
> ps I tagged trunk as bulk_transactions (maybe coulda picked a better
> name) so we have a record of the last point of 0.9 development that
> had the old semantics. Please don't use this tag.
>
> -- 
> Chris Anderson
> http://jchris.mfdz.com


Mime
View raw message