incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Inability to back up and restore properly
Date Thu, 09 Apr 2009 03:40:52 GMT
On Wed, Apr 8, 2009 at 11:03 PM, Jeff Hinrichs - DM&T
<> wrote:
> On Tue, Apr 7, 2009 at 11:00 AM, Paul Davis <> wrote:
>> On Tue, Apr 7, 2009 at 10:42 AM, Nils Breunese <> wrote:
>>> Jeff Hinrichs - DM&T wrote:
>>>> What is the proper way to backup and restore a couchdb?  I mean a real
>>>> proper dump/load cycle.
>>>> couchdb doesn't provide a way to do a proper dump/reload cycle which
>>>> leaves
>>>> us to try and write our own.  However, if you dump a document like
>>>> {'_id':'foo',''_rev':'2-xyz',...}
>>>> There is not a single way that I can find to load an empty database and
>>>> recreate that same record.  If you put the
>>>> {'_id':'foo',''_rev':'2-xyz',...}, you get
>>>> {'_id':'foo',''_rev':'3-mno',...},  which is not the same as
>>>> {'_id':'foo',''_rev':'2-xyz',...}.
>>>> In some use cases it is necessary to be able to restore data to the way it
>>>> was at a point in time.  Sometimes for logic reasons, some times for error
>>>> recovery and debugging and some times for legal reasons.  The seemingly
>>>> only
>>>> way possible to do that is to bring up another couchdb instance and
>>>> replicate to it.  However, that is a bit problematic for normal long term
>>>> storage methodologies.
>>>> What is the API I should be using?   If no such api exists, is it an
>>>> oversight or just a matter of resources?  There should be a way to load
>>>> data
>>>> into couch and have couchdb just accept it, keeping the _rev information
>>>> that is passed.  I am not proposing to change the mode of operation, but
>>>> to
>>>> create a new one.  Even better would be to have couchdb do a
>>>> /database/_dump
>>>> that streams out documents and a post /database/_load with the posted file
>>>> from the /database/_dump.
>>>> so that you have some couchdb database foo in state 'A', you dump, then
>>>> create database bar, and load the dump from foo and when the process is
>>>> finished, a replication from foo state 'A' to bar results in
>>>> {"start_time":"Tue, 07 Apr 2009 03:02:16 GMT","end_time":"Tue, 07 Apr 2009
>>>> 03:02:16
>>>> GMT","start_last_seq":0,"end_last_seq":100,"missing_checked":100,"missing_found":0,"docs_read":0,"docs_written":0,"doc_write_failures":0}
>>> AFAIK you can just copy (or rsync) the database files, even with CouchDB
>>> running.
>>> Nils Breunese.
>> I forgot to mention that Nils is right as well with the caveat that
>> you need to make sure you're copying databases between binary file
>> compatible versions of CouchDB.
> I knew about copying the couchdb database files, but don't like that
> as an option because of the exact reason that you mention.  It gives
> you a backup that forces you to store the source code of couchdb with
> it.  However, this is the only way I know of that supports keeping
> conflicted documents in tact. None of the scripts that I've reviewed,
> including mine, allow for the dump to include conflicted revisions of
> documents.  And if you do dump them, there isn't a way to set the
> conflicted flag on a specific revision of document during the load. (I
> could have missed that in the docs though -- so links are welcome ;)
> This could be solved by couchdb offering a mechanism/mode/api to allow
> loading data into an empty database.  something like /database/_dump
> and /database/_load

I'm pretty sure it should be possible to get a conflicted version out
of couchdb but I haven't had to deal with this part of the API so I
can't speak with too much authority. If it turns out not to be the
case then we should rectify that. Trying to keep a copy of the source
code for backups is a fool's errand.

Also, assuming you can get the revision history information out of
CouchDB, the algorithms involved are already guaranteed to pick the
same conflict winner when seeing that history. That's one of the
important aspects of the conflict checking that avoids requiring a
single master.

_dump/_load endpoints have a very attractive conciseness but my first
goal would be to figure out why its not currently possible with the
existing API. I would prefer to maximize the power of the current end
points rather than create new focused ones.

Paul Davis

View raw message