incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hinrichs - DM&T" <dunde...@gmail.com>
Subject Re: Inability to back up and restore properly
Date Thu, 09 Apr 2009 03:03:17 GMT
On Tue, Apr 7, 2009 at 11:00 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> On Tue, Apr 7, 2009 at 10:42 AM, Nils Breunese <n.breunese@vpro.nl> wrote:
>> Jeff Hinrichs - DM&T wrote:
>>
>>> What is the proper way to backup and restore a couchdb?  I mean a real
>>> proper dump/load cycle.
>>>
>>> couchdb doesn't provide a way to do a proper dump/reload cycle which
>>> leaves
>>> us to try and write our own.  However, if you dump a document like
>>>
>>> {'_id':'foo',''_rev':'2-xyz',...}
>>>
>>> There is not a single way that I can find to load an empty database and
>>> recreate that same record.  If you put the
>>> {'_id':'foo',''_rev':'2-xyz',...}, you get
>>> {'_id':'foo',''_rev':'3-mno',...},  which is not the same as
>>> {'_id':'foo',''_rev':'2-xyz',...}.
>>>
>>> In some use cases it is necessary to be able to restore data to the way it
>>> was at a point in time.  Sometimes for logic reasons, some times for error
>>> recovery and debugging and some times for legal reasons.  The seemingly
>>> only
>>> way possible to do that is to bring up another couchdb instance and
>>> replicate to it.  However, that is a bit problematic for normal long term
>>> storage methodologies.
>>>
>>> What is the API I should be using?   If no such api exists, is it an
>>> oversight or just a matter of resources?  There should be a way to load
>>> data
>>> into couch and have couchdb just accept it, keeping the _rev information
>>> that is passed.  I am not proposing to change the mode of operation, but
>>> to
>>> create a new one.  Even better would be to have couchdb do a
>>> /database/_dump
>>> that streams out documents and a post /database/_load with the posted file
>>> from the /database/_dump.
>>>
>>> so that you have some couchdb database foo in state 'A', you dump, then
>>> create database bar, and load the dump from foo and when the process is
>>> finished, a replication from foo state 'A' to bar results in
>>> {"start_time":"Tue, 07 Apr 2009 03:02:16 GMT","end_time":"Tue, 07 Apr 2009
>>> 03:02:16
>>>
>>> GMT","start_last_seq":0,"end_last_seq":100,"missing_checked":100,"missing_found":0,"docs_read":0,"docs_written":0,"doc_write_failures":0}
>>
>> AFAIK you can just copy (or rsync) the database files, even with CouchDB
>> running.
>>
>> Nils Breunese.
>>
>
> I forgot to mention that Nils is right as well with the caveat that
> you need to make sure you're copying databases between binary file
> compatible versions of CouchDB.
>
I knew about copying the couchdb database files, but don't like that
as an option because of the exact reason that you mention.  It gives
you a backup that forces you to store the source code of couchdb with
it.  However, this is the only way I know of that supports keeping
conflicted documents in tact. None of the scripts that I've reviewed,
including mine, allow for the dump to include conflicted revisions of
documents.  And if you do dump them, there isn't a way to set the
conflicted flag on a specific revision of document during the load. (I
could have missed that in the docs though -- so links are welcome ;)

This could be solved by couchdb offering a mechanism/mode/api to allow
loading data into an empty database.  something like /database/_dump
and /database/_load

Mime
View raw message