couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: rep_security merge to trunk
Date Wed, 11 Mar 2009 23:07:26 GMT
On Wed, Mar 11, 2009 at 8:34 AM, Damien Katz <damien@apache.org> wrote:
>
> On Mar 10, 2009, at 7:06 PM, Chris Anderson wrote:
>
>> On Tue, Mar 10, 2009 at 3:44 PM, Damien Katz <damien@apache.org> wrote:
>>>
>>> This patch breaks the file format and replication API, so replication
>>> with
>>> earlier versions is not possible.
>>
>> The rev format has changed. Does this mean that migrating existing
>> data will involve getting each doc from oldDB, stripping the _rev, and
>> loading it into newDB?
>
> Yes, but it should be possible to convert the revs to the new format too.
> But why?
>
>>
>> It should be pretty straightforward to write a Python or Ruby script
>> that does this in bulk to transfer docs. It's essentially a version of
>> the python dump / load tools that doesn't require putting the whole db
>> on disk as an intermediary.
>>
>> I'll volunteer but I wonder how I should handle docs with conflicts in
>> the oldDB?
>
> Oh that's why. Using the replicator API would work for that.
>

A little confused as to the plan here. Let me try to articulate:

Write a script that pulls all_docs_by_seq from the old version of
CouchDB in batches of 1000, and for each doc loads the head rev (and
any conflict revs) into memory.

Then it creates a bulk_docs POST for those docs, by stripping the rev
from any docs that don't have conflicts, and any docs that have
conflicts, creating a series of revs like this (pretend there are 199
conflict revs)

1-sdfjhgsaf
2-asdfkjsad
..
199-asdf7tsfd

and applying the revs to each doc in the conflict set. Does the rev
ordering matter? Assuming I don't reuse the prefix number, does the
format/length of the second rev part matter?

Then using a normal POST of an object like {"docs":[...array of
docs...]} to the /db/_bulk_docs URL (with no special query option),
the new docs (and conflict revs) will get stored in the new DB?

Or do I need to assign well-formed made up revs to the non-conflicting
docs (they'd all get "1-foobar") and use the ?new_edits=false option
on the bulk_docs POST ?

I think getting this clear on the list will help everyone's
understanding of the new bulk_docs semantics. (I don't plan to include
in my migrator the ability to transfer any docs which would be lost on
the source DB during compaction... only the HEAD rev and any conflicts
will be transfered.)

Chris

ps I tagged trunk as bulk_transactions (maybe coulda picked a better
name) so we have a record of the last point of 0.9 development that
had the old semantics. Please don't use this tag.

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message