couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <>
Subject Re: data recovery tool progress
Date Tue, 10 Aug 2010 19:06:22 GMT
Good idea.  Now we've got

> [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576 bytes at
> [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 1048576 bytes at
> [info] [<0.33.0>] couch_db_repair for testwritesdb - scanning 331526 bytes at 0
> [info] [<0.33.0>] couch_db_repair writing 12 updates to lost+found/testwritesdb
> [info] [<0.33.0>] couch_db_repair writing 9 updates to lost+found/testwritesdb
> [info] [<0.33.0>] couch_db_repair writing 8 updates to lost+found/testwritesdb


On Aug 10, 2010, at 2:29 PM, Robert Newson wrote:

> It took 20 minutes before the first 'update' line came out, but now
> seems to be recovering smoothly. machine load is back down to sane
> levels.
> Suggest feedback during the hunting phase.
> B.
> On Tue, Aug 10, 2010 at 7:11 PM, Adam Kocoloski <> wrote:
>> Thanks for the crosscheck.  I'm not aware of anything in the node finder that would
cause it to struggle mightily with healthy DBs.  It pretty much ignores the health of the
DB, in fact.  Would be interested to hear more.
>> On Aug 10, 2010, at 1:59 PM, Robert Newson wrote:
>>> I verified the new code's ability to repair the testwritesdb. system
>>> load was smooth from start to finish.
>>> I started a further test on a different (healthy) database and system
>>> load was severe again, just collecting the roots (the lost+found db
>>> was not yet created when I aborted the attempt). I suspect the fact
>>> that it's healthy is the issue, so if I'm right, perhaps a warning is
>>> useful.
>>> B.
>>> On Tue, Aug 10, 2010 at 6:53 PM, Adam Kocoloski <> wrote:
>>>> Another update.  This morning I took a different tack and, rather than try
to find root nodes, I just looked for all kv_nodes in the file and treated each of those as
a separate virtual DB to be replicated.  This reduces the algorithmic complexity of the repair,
and it looks like testwritesdb repairs in ~30 minutes or so.  Also, this method results in
the lost+found DB containing every document, not just the missing ones.
>>>> My branch does not currently include Randall's parallelization of the replications.
 It's still CPU-limited, so that may be a worthwhile optimization.  On the other hand, I think
we may be reaching a stage at which performance for this repair tool is 'good enough', and
pmaps can make error handling a bit dicey.
>>>> In short, I think this tool is now in good shape.

View raw message